Vector DB cheatsheet.

Options

DBTypeBest for
Qdrantopen, managedmost use cases
Pineconemanagedscale, ops-light
Weaviateopen, managedfull-text + vector
Milvusopenbillions of vectors
Chromaembeddedprototyping
pgvectorPostgres extalready on Postgres
Rediswith redis-stacklow-latency cache + vector
Elasticsearchwith kNNalready on ES

Qdrant

from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="docs",
    vectors_config={"size": 1536, "distance": "Cosine"},
    hnsw_config={"m": 16, "ef_construct": 100},
)

client.upsert(
    collection_name="docs",
    points=[
        {"id": 1, "vector": [...], "payload": {"text": "...", "source": "..."}},
    ],
)

hits = client.search(
    collection_name="docs",
    query_vector=[...],
    limit=5,
    query_filter={"must": [{"key": "source", "match": {"value": "docs.example.com"}}]},
)

pgvector

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE docs (
    id BIGSERIAL PRIMARY KEY,
    text TEXT NOT NULL,
    metadata JSONB,
    embedding vector(1536)
);

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);

INSERT INTO docs (text, metadata, embedding)
VALUES ('hello', '{"source":"x"}', '[0.1, 0.2, ...]');

SELECT text, 1 - (embedding <=> '[...]') AS similarity
FROM docs
ORDER BY embedding <=> '[...]'
LIMIT 5;

Operators: <=> cosine, <-> L2, <#> -inner.

HNSW params

  • m: graph degree (16-64). Higher = more accurate, more memory.
  • ef_construct: build search quality (64-512).
  • ef (query): search depth at query time.

Tune for latency vs recall.

Filtering strategies

Pre-filter (filter then search): accurate, may be slow if filtered set sparse.

Post-filter (search then filter): fast, may return too few.

Qdrant / Weaviate handle this smartly.

Metadata indexing

client.create_payload_index(
    collection_name="docs",
    field_name="source",
    field_schema="keyword",
)

Speeds up filters on indexed fields.

Sharding / scale

  • Qdrant: collection sharding, replication.
  • Pinecone: pods scale with traffic.
  • Milvus: distributed natively.

For 100k+ vectors: any will do. For 100M+: Milvus / Pinecone tier.

Batch upserts

client.upsert(collection_name="docs", points=batch_of_1000)

Always batch (100-1000). Per-row inserts are slow.

Updates

Vector DBs support upsert by ID. For full re-embedding (model change):

for batch in chunks:
    embeddings = embed_batch(batch)
    client.upsert(...)

Plan migrations as background jobs.

Hybrid (vector + keyword)

Most DBs support BM25 + vector + RRF (rank fusion).

# Qdrant 1.10+
hits = client.query_points(
    collection_name="docs",
    prefetch=[
        {"query": q_vec, "limit": 50},
        {"query": text_index_query, "limit": 50},
    ],
    query={"fusion": "rrf"},
    limit=10,
)

Backup / migration

# Qdrant snapshot
curl -X POST http://localhost:6333/collections/docs/snapshots
# Files in $QDRANT_HOME/storage/collections/docs/snapshots/

Restore: copy files + restart.

When pgvector suffices

If you already have Postgres + <10M vectors + few QPS:

SELECT * FROM docs
ORDER BY embedding <=> $1
LIMIT 5;

Simpler ops. Use HNSW index for speed.

Observability

  • QPS, p99 latency.
  • Recall (% relevant in top-k vs known ground truth).
  • Memory usage (vectors are RAM-heavy with HNSW).

Compression / quantization

  • Scalar quantization: float32 → int8. ~4x smaller, minor quality loss.
  • Binary quantization: 1 bit per dim. Extreme.
  • Product quantization (PQ): for billions of vectors.

Qdrant supports all.

client.update_collection(
    "docs",
    quantization_config={"scalar": {"type": "int8", "always_ram": True}},
)

Cost

  • Self-host: VM + storage. Cheap at low scale.
  • Pinecone: per-pod hour, varies.
  • Qdrant Cloud: similar.

For most: self-host Qdrant on a $20/mo VM.

Common mistakes

  • No payload index on filter fields → slow queries.
  • Mismatched embedding dim across docs.
  • Inserting without batching.
  • Storing huge raw text in payload (use ID, fetch from primary DB).
  • Forgetting backups.

Read this next

If you want my Qdrant + pgvector setup, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .