Cheatsheet for pgvector via SQLAlchemy.
Setup
CREATE EXTENSION vector;
uv add pgvector
Column
from pgvector.sqlalchemy import Vector
class Doc(Base):
__tablename__ = "documents"
id: Mapped[int] = mapped_column(primary_key=True)
body: Mapped[str]
embedding: Mapped[list[float]] = mapped_column(Vector(1024))
Dimension fixed at column creation.
Insert
doc = Doc(body="...", embedding=embed_function(text)) # list[float] of length 1024
session.add(doc)
await session.commit()
Distance queries
# Cosine distance (most common for normalized embeddings)
stmt = (
select(Doc, Doc.embedding.cosine_distance(q_emb).label("dist"))
.order_by("dist")
.limit(10)
)
# L2 (Euclidean)
stmt = select(Doc).order_by(Doc.embedding.l2_distance(q_emb)).limit(10)
# Inner product (for unnormalized)
stmt = select(Doc).order_by(Doc.embedding.max_inner_product(q_emb)).limit(10)
Operators in raw SQL: <=> cosine, <-> L2, <#> negative inner product.
HNSW index (production default in 2026)
from sqlalchemy import Index
__table_args__ = (
Index(
"ix_docs_embedding_hnsw",
"embedding",
postgresql_using="hnsw",
postgresql_ops={"embedding": "vector_cosine_ops"},
postgresql_with={"m": 16, "ef_construction": 64},
),
)
Operator class must match query distance:
vector_cosine_opsfor<=>vector_l2_opsfor<->vector_ip_opsfor<#>
IVFFlat index
__table_args__ = (
Index(
"ix_docs_embedding_ivfflat",
"embedding",
postgresql_using="ivfflat",
postgresql_ops={"embedding": "vector_cosine_ops"},
postgresql_with={"lists": 100}, # sqrt(N) rule of thumb
),
)
Faster to build; slightly lower recall than HNSW. Build AFTER inserting some data (needs sample).
Tune at query time
# HNSW: increase recall at cost of latency
await session.execute(text("SET LOCAL hnsw.ef_search = 100"))
# IVFFlat
await session.execute(text("SET LOCAL ivfflat.probes = 10"))
Hybrid (filter + ANN)
stmt = (
select(Doc)
.where(Doc.user_id == user_id, Doc.deleted_at == None)
.order_by(Doc.embedding.cosine_distance(q_emb))
.limit(10)
)
Filter then ANN. For HNSW with strict filter: relies on Postgres planner to push filter; sometimes slow at high cardinality.
For best perf with restrictive filter: combine with B-tree on the filter column, or partition the table.
Similarity threshold
threshold = 0.3
stmt = (
select(Doc, Doc.embedding.cosine_distance(q_emb).label("d"))
.where(Doc.embedding.cosine_distance(q_emb) < threshold)
.order_by("d")
.limit(10)
)
Half-precision (smaller storage)
pgvector 0.7+ supports halfvec:
from pgvector.sqlalchemy import HALFVEC
class Doc(Base):
embedding: Mapped[list[float]] = mapped_column(HALFVEC(1024))
50% storage; small accuracy hit. Use when memory matters.
Sparse vectors
from pgvector.sqlalchemy import SPARSEVEC
# pgvector 0.7+ supports sparse vectors
For ColBERT-style or keyword-augmented embeddings.
Storage size
- 1024-dim float32: 4 KB per row.
- 1M rows: 4 GB just for embeddings.
- Half-precision: 2 KB per row.
For really large indexes: HNSW build memory matters.
Re-embedding on text change
@event.listens_for(Doc, "before_update")
def re_embed(mapper, connection, target):
if target.body_changed:
target.embedding = embed_sync(target.body)
Or async via outbox + worker for slow embedding calls.
Multi-vector (multiple embeddings per row)
class Doc(Base):
title_embedding: Mapped[list[float]] = mapped_column(Vector(1024))
body_embedding: Mapped[list[float]] = mapped_column(Vector(1024))
Two indexes. Combine in queries.
Common mistakes
- HNSW index with wrong opclass (e.g.,
vector_cosine_ops) but querying with L2 — index unused. - Filter pushdown failing with high-cardinality WHERE — slow.
- Storing huge vectors (>4096-dim) — IVFFlat / HNSW limits.
- Forgetting to set
ef_searchfor higher recall.
Read this next
If you want my pgvector + RAG starter, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .