AI/LLM Cheatsheet 16 — Vector DBs Deep Dive

Vector DB cheatsheet.

Options

DB	Type	Best for
Qdrant	open, managed	most use cases
Pinecone	managed	scale, ops-light
Weaviate	open, managed	full-text + vector
Milvus	open	billions of vectors
Chroma	embedded	prototyping
pgvector	Postgres ext	already on Postgres
Redis	with redis-stack	low-latency cache + vector
Elasticsearch	with kNN	already on ES

Qdrant

from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="docs",
    vectors_config={"size": 1536, "distance": "Cosine"},
    hnsw_config={"m": 16, "ef_construct": 100},
)

client.upsert(
    collection_name="docs",
    points=[
        {"id": 1, "vector": [...], "payload": {"text": "...", "source": "..."}},
    ],
)

hits = client.search(
    collection_name="docs",
    query_vector=[...],
    limit=5,
    query_filter={"must": [{"key": "source", "match": {"value": "docs.example.com"}}]},
)

pgvector

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE docs (
    id BIGSERIAL PRIMARY KEY,
    text TEXT NOT NULL,
    metadata JSONB,
    embedding vector(1536)
);

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);

INSERT INTO docs (text, metadata, embedding)
VALUES ('hello', '{"source":"x"}', '[0.1, 0.2, ...]');

SELECT text, 1 - (embedding <=> '[...]') AS similarity
FROM docs
ORDER BY embedding <=> '[...]'
LIMIT 5;

Operators: <=> cosine, <-> L2, <#> -inner.

HNSW params

m: graph degree (16-64). Higher = more accurate, more memory.
ef_construct: build search quality (64-512).
ef (query): search depth at query time.

Tune for latency vs recall.

Filtering strategies

Pre-filter (filter then search): accurate, may be slow if filtered set sparse.

Post-filter (search then filter): fast, may return too few.

Qdrant / Weaviate handle this smartly.

Metadata indexing

client.create_payload_index(
    collection_name="docs",
    field_name="source",
    field_schema="keyword",
)

Speeds up filters on indexed fields.

Sharding / scale

Qdrant: collection sharding, replication.
Pinecone: pods scale with traffic.
Milvus: distributed natively.

For 100k+ vectors: any will do. For 100M+: Milvus / Pinecone tier.

Batch upserts

client.upsert(collection_name="docs", points=batch_of_1000)

Always batch (100-1000). Per-row inserts are slow.

Updates

Vector DBs support upsert by ID. For full re-embedding (model change):

for batch in chunks:
    embeddings = embed_batch(batch)
    client.upsert(...)

Plan migrations as background jobs.

Hybrid (vector + keyword)

Most DBs support BM25 + vector + RRF (rank fusion).

# Qdrant 1.10+
hits = client.query_points(
    collection_name="docs",
    prefetch=[
        {"query": q_vec, "limit": 50},
        {"query": text_index_query, "limit": 50},
    ],
    query={"fusion": "rrf"},
    limit=10,
)

Backup / migration

# Qdrant snapshot
curl -X POST http://localhost:6333/collections/docs/snapshots
# Files in $QDRANT_HOME/storage/collections/docs/snapshots/

Restore: copy files + restart.

When pgvector suffices

If you already have Postgres + <10M vectors + few QPS:

SELECT * FROM docs
ORDER BY embedding <=> $1
LIMIT 5;

Simpler ops. Use HNSW index for speed.

Observability

QPS, p99 latency.
Recall (% relevant in top-k vs known ground truth).
Memory usage (vectors are RAM-heavy with HNSW).

Compression / quantization

Scalar quantization: float32 → int8. ~4x smaller, minor quality loss.
Binary quantization: 1 bit per dim. Extreme.
Product quantization (PQ): for billions of vectors.

Qdrant supports all.

client.update_collection(
    "docs",
    quantization_config={"scalar": {"type": "int8", "always_ram": True}},
)

Cost

Self-host: VM + storage. Cheap at low scale.
Pinecone: per-pod hour, varies.
Qdrant Cloud: similar.

For most: self-host Qdrant on a $20/mo VM.

Common mistakes

No payload index on filter fields → slow queries.
Mismatched embedding dim across docs.
Inserting without batching.
Storing huge raw text in payload (use ID, fetch from primary DB).
Forgetting backups.

Options#

Qdrant#

pgvector#

HNSW params#

Filtering strategies#

Metadata indexing#

Sharding / scale#

Batch upserts#

Updates#

Hybrid (vector + keyword)#

Backup / migration#

When pgvector suffices#

Observability#

Compression / quantization#

Cost#

Common mistakes#

Read this next#