Vector Databases
A vector database stores high-dimensional embeddings and answers approximate nearest-neighbour (ANN) queries in milliseconds. The word "approximate" is key: ANN trades a small accuracy loss for orders-of-magnitude speed improvement over exact nearest-neighbour search across millions of vectors.
ANN Index Algorithms
| Algorithm | Full name | Best for | Trade-offs | |---|---|---|---| | HNSW | Hierarchical Navigable Small World | Low-latency online queries | High memory usage | | IVF | Inverted File Index | Large-scale batch retrieval | Requires training on corpus | | IVF+PQ | IVF + Product Quantization | Very large corpora (>100M) | Lower accuracy | | FLAT | Exact brute force | < 100k vectors | O(n) per query, no approximation |
HNSW is the default in most production RAG systems. It builds a layered graph where upper layers provide coarse navigation and lower layers provide fine-grained search, achieving O(log n) average query time.
Vector Database Comparison
| Database | Hosting | HNSW | Metadata filtering | Pricing model | |---|---|---|---|---| | Pinecone | Managed cloud | Yes | Yes (rich) | Per vector/query | | Qdrant | Self-host or cloud | Yes | Yes (payload filters) | Open source + cloud | | Chroma | Local / self-host | Yes | Yes (where clauses) | Open source | | pgvector | PostgreSQL extension | Yes (v0.5+) | Full SQL | Self-host | | Weaviate | Self-host or cloud | Yes | Yes (GraphQL) | Open source + cloud |
For prototyping and teams without infrastructure constraints, Chroma is the fastest path. For production at scale with SLA requirements, Qdrant self-hosted or Pinecone managed are the most common choices.
Chroma — Local Prototype
Qdrant — Production Setup
Index Tuning for HNSW
As a rule: increase m to improve recall, increase ef_construct for higher index quality, and increase ef to trade query latency for recall.
Summary
- ANN indexes (HNSW, IVF) trade a small accuracy loss for O(log n) query performance instead of O(n) brute force.
- HNSW is the default for production RAG — it offers the best latency-recall trade-off for online query workloads.
- Chroma is ideal for prototypes; Qdrant is a strong self-hosted production choice; Pinecone is the easiest fully managed option.
- Use metadata filters at query time to narrow the search space — they are far more efficient than post-retrieval filtering.
- Tune HNSW parameters (
m,ef_construct,ef) based on your recall@k target and acceptable query latency.