Cheatsheet for pgvector embeddings in SQLAlchemy.
Setup
CREATE EXTENSION vector;
uv add pgvector
Column
from pgvector.sqlalchemy import Vector
class Document(Base):
__tablename__ = "documents"
id: Mapped[int] = mapped_column(primary_key=True)
body: Mapped[str]
embedding: Mapped[list[float]] = mapped_column(Vector(1024))
Choose dimensions to match your embedding model (1024 for bge-m3, 1536 for OpenAI ada-002, 3072 for text-embedding-3-large).
HNSW index (recommended)
__table_args__ = (
Index(
"ix_docs_embedding_hnsw",
"embedding",
postgresql_using="hnsw",
postgresql_ops={"embedding": "vector_cosine_ops"},
),
)
HNSW: high-recall ANN. The 2026 production default.
IVFFlat (smaller / faster build)
__table_args__ = (
Index(
"ix_docs_embedding_ivf",
"embedding",
postgresql_using="ivfflat",
postgresql_ops={"embedding": "vector_cosine_ops"},
postgresql_with={"lists": 100}, # ~sqrt(rows)
),
)
Faster to build; slightly lower recall.
Distance operators
| Operator | Distance | Ops class |
|---|---|---|
<=> | Cosine | vector_cosine_ops |
<-> | L2 (euclidean) | vector_l2_ops |
<#> | Inner product | vector_ip_ops |
Similarity query
q_emb = await embed("query text")
stmt = (
select(Document, Document.embedding.cosine_distance(q_emb).label("dist"))
.order_by("dist")
.limit(10)
)
Returns top-10 by cosine distance.
Or use the raw operator:
from sqlalchemy import literal_column
stmt = (
select(Document)
.order_by(Document.embedding.op("<=>")(q_emb))
.limit(10)
)
Hybrid (filter + ANN)
stmt = (
select(Document)
.where(Document.user_id == user_id)
.order_by(Document.embedding.cosine_distance(q_emb))
.limit(10)
)
For best perf: ensure planner uses HNSW. May need SET hnsw.ef_search = 200.
Tuning HNSW
async with session.begin():
await session.execute(text("SET LOCAL hnsw.ef_search = 200"))
result = await session.execute(stmt)
Higher ef_search → better recall, slower query.
Index build params:
postgresql_with={"m": 16, "ef_construction": 64}
Cosine similarity (1 - distance)
similarity_col = (1 - Document.embedding.cosine_distance(q_emb)).label("similarity")
stmt = select(Document, similarity_col).order_by(similarity_col.desc()).limit(10)
Threshold filtering
stmt = (
select(Document)
.where(Document.embedding.cosine_distance(q_emb) < 0.5)
.order_by(Document.embedding.cosine_distance(q_emb))
.limit(10)
)
Only return matches with similarity > 0.5.
Bulk insert with embeddings
records = [
{"body": doc, "embedding": await embed(doc)}
for doc in docs
]
await session.execute(insert(Document), records)
await session.commit()
For huge loads: batch + COPY-friendly format.
Half-precision (memory savings)
pgvector 0.7+:
from pgvector.sqlalchemy import HALFVEC
class Document(Base):
embedding: Mapped[list[float]] = mapped_column(HALFVEC(1024))
4 bytes → 2 bytes per dimension. Recall ~unchanged.
Sparse vectors (pgvector 0.7+)
For TF-IDF-style sparse embeddings:
from pgvector.sqlalchemy import SPARSEVEC
class Document(Base):
sparse: Mapped[SparseVec] = mapped_column(SPARSEVEC(10000))
Re-embed on change
@event.listens_for(Document, "before_update")
def update_embedding(mapper, conn, target):
if target.body_changed:
target.embedding = compute_embedding_sync(target.body)
For expensive embed: async via outbox + worker.
Hybrid keyword + vector (RRF)
from sqlalchemy.dialects.postgresql import insert
# Two separate queries; combine with RRF in app
fts_results = await session.execute(
select(Document.id).where(Document.search_vec.op("@@")(q_ts)).limit(50)
)
vec_results = await session.execute(
select(Document.id).order_by(Document.embedding.cosine_distance(q_emb)).limit(50)
)
def rrf(rankings, k=60):
scores = defaultdict(float)
for ranking in rankings:
for rank, item_id in enumerate(ranking):
scores[item_id] += 1 / (k + rank)
return sorted(scores.items(), key=lambda x: -x[1])
See Search System Design .
Capacity
| Vectors | Recommended |
|---|---|
| < 10M | pgvector HNSW |
| 10M-100M | pgvector with tuning, or Qdrant/Weaviate |
| > 100M | Dedicated vector DB |
See Embedding Databases 2026 .
Common mistakes
- Wrong dimensions (mismatch with embedding model) — runtime error.
- No HNSW index — every query is seq scan.
- Filter + ANN with low ef_search — poor recall on filtered subsets.
- Forgetting to update embedding on content change.
Read this next
If you want my pgvector + bge-m3 RAG starter, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .