🤖 AI Summary
Comet is a new, pure-Go hybrid vector database pitched as “built for hackers, not hyperscalers”: a compact, readable library that combines multiple vector index types, full‑text search and compressed metadata filtering into one hackable package. It’s meant both as a practical engine for embedding-based retrieval and an educational reference to understand how vector DBs work. For AI/ML engineers this matters because you get an embeddable, thread‑safe implementation that supports hybrid retrieval (vector + text + filters) with configurable fusion strategies (reciprocal rank fusion, weighted sum, etc.), making it easy to prototype pipelines without heavyweight services.
Technically, Comet implements five index types (Flat, HNSW, IVF, PQ, IVFPQ) and three distance metrics (L2, L2², Cosine). Full‑text uses BM25 with Unicode tokenization; metadata filtering uses Roaring Bitmaps and Bit‑Sliced Indexes for fast boolean and numeric range queries. Advanced features include multi‑KNN queries, autocut trimming, soft deletes, serialization, and PQ compression that can shrink 384‑d float32 vectors from ~1.46 GB to a few MB for 1M vectors. The unified hybrid coordinator lets you pre‑filter candidates, run vector/text searches on the set, then fuse ranks—trading recall, latency and memory depending on HNSW/IVF/PQ settings. In short, Comet is a lightweight, open Go toolkit for building, experimenting with, and learning about production‑grade vector search systems.
Loading comments...
login to comment
loading comments...
no comments yet