Vector DB cheatsheet.
Options
| DB | Type | Best for |
|---|---|---|
| Qdrant | open, managed | most use cases |
| Pinecone | managed | scale, ops-light |
| Weaviate | open, managed | full-text + vector |
| Milvus | open | billions of vectors |
| Chroma | embedded | prototyping |
| pgvector | Postgres ext | already on Postgres |
| Redis | with redis-stack | low-latency cache + vector |
| Elasticsearch | with kNN | already on ES |
Qdrant
from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="docs",
vectors_config={"size": 1536, "distance": "Cosine"},
hnsw_config={"m": 16, "ef_construct": 100},
)
client.upsert(
collection_name="docs",
points=[
{"id": 1, "vector": [...], "payload": {"text": "...", "source": "..."}},
],
)
hits = client.search(
collection_name="docs",
query_vector=[...],
limit=5,
query_filter={"must": [{"key": "source", "match": {"value": "docs.example.com"}}]},
)
pgvector
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE docs (
id BIGSERIAL PRIMARY KEY,
text TEXT NOT NULL,
metadata JSONB,
embedding vector(1536)
);
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
INSERT INTO docs (text, metadata, embedding)
VALUES ('hello', '{"source":"x"}', '[0.1, 0.2, ...]');
SELECT text, 1 - (embedding <=> '[...]') AS similarity
FROM docs
ORDER BY embedding <=> '[...]'
LIMIT 5;
Operators: <=> cosine, <-> L2, <#> -inner.
HNSW params
- m: graph degree (16-64). Higher = more accurate, more memory.
- ef_construct: build search quality (64-512).
- ef (query): search depth at query time.
Tune for latency vs recall.
Filtering strategies
Pre-filter (filter then search): accurate, may be slow if filtered set sparse.
Post-filter (search then filter): fast, may return too few.
Qdrant / Weaviate handle this smartly.
Metadata indexing
client.create_payload_index(
collection_name="docs",
field_name="source",
field_schema="keyword",
)
Speeds up filters on indexed fields.
Sharding / scale
- Qdrant: collection sharding, replication.
- Pinecone: pods scale with traffic.
- Milvus: distributed natively.
For 100k+ vectors: any will do. For 100M+: Milvus / Pinecone tier.
Batch upserts
client.upsert(collection_name="docs", points=batch_of_1000)
Always batch (100-1000). Per-row inserts are slow.
Updates
Vector DBs support upsert by ID. For full re-embedding (model change):
for batch in chunks:
embeddings = embed_batch(batch)
client.upsert(...)
Plan migrations as background jobs.
Hybrid (vector + keyword)
Most DBs support BM25 + vector + RRF (rank fusion).
# Qdrant 1.10+
hits = client.query_points(
collection_name="docs",
prefetch=[
{"query": q_vec, "limit": 50},
{"query": text_index_query, "limit": 50},
],
query={"fusion": "rrf"},
limit=10,
)
Backup / migration
# Qdrant snapshot
curl -X POST http://localhost:6333/collections/docs/snapshots
# Files in $QDRANT_HOME/storage/collections/docs/snapshots/
Restore: copy files + restart.
When pgvector suffices
If you already have Postgres + <10M vectors + few QPS:
SELECT * FROM docs
ORDER BY embedding <=> $1
LIMIT 5;
Simpler ops. Use HNSW index for speed.
Observability
- QPS, p99 latency.
- Recall (% relevant in top-k vs known ground truth).
- Memory usage (vectors are RAM-heavy with HNSW).
Compression / quantization
- Scalar quantization: float32 → int8. ~4x smaller, minor quality loss.
- Binary quantization: 1 bit per dim. Extreme.
- Product quantization (PQ): for billions of vectors.
Qdrant supports all.
client.update_collection(
"docs",
quantization_config={"scalar": {"type": "int8", "always_ram": True}},
)
Cost
- Self-host: VM + storage. Cheap at low scale.
- Pinecone: per-pod hour, varies.
- Qdrant Cloud: similar.
For most: self-host Qdrant on a $20/mo VM.
Common mistakes
- No payload index on filter fields → slow queries.
- Mismatched embedding dim across docs.
- Inserting without batching.
- Storing huge raw text in payload (use ID, fetch from primary DB).
- Forgetting backups.
Read this next
If you want my Qdrant + pgvector setup, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .