SQLAlchemy + Postgres Cheatsheet 12 — pgvector (Embeddings)

Cheatsheet for pgvector embeddings in SQLAlchemy.

Setup

CREATE EXTENSION vector;

uv add pgvector

Column

from pgvector.sqlalchemy import Vector

class Document(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    body: Mapped[str]
    embedding: Mapped[list[float]] = mapped_column(Vector(1024))

Choose dimensions to match your embedding model (1024 for bge-m3, 1536 for OpenAI ada-002, 3072 for text-embedding-3-large).

HNSW index (recommended)

__table_args__ = (
    Index(
        "ix_docs_embedding_hnsw",
        "embedding",
        postgresql_using="hnsw",
        postgresql_ops={"embedding": "vector_cosine_ops"},
    ),
)

HNSW: high-recall ANN. The 2026 production default.

IVFFlat (smaller / faster build)

__table_args__ = (
    Index(
        "ix_docs_embedding_ivf",
        "embedding",
        postgresql_using="ivfflat",
        postgresql_ops={"embedding": "vector_cosine_ops"},
        postgresql_with={"lists": 100},   # ~sqrt(rows)
    ),
)

Faster to build; slightly lower recall.

Distance operators

Operator	Distance	Ops class
`<=>`	Cosine	`vector_cosine_ops`
`<->`	L2 (euclidean)	`vector_l2_ops`
`<#>`	Inner product	`vector_ip_ops`

Similarity query

q_emb = await embed("query text")

stmt = (
    select(Document, Document.embedding.cosine_distance(q_emb).label("dist"))
    .order_by("dist")
    .limit(10)
)

Returns top-10 by cosine distance.

Or use the raw operator:

from sqlalchemy import literal_column

stmt = (
    select(Document)
    .order_by(Document.embedding.op("<=>")(q_emb))
    .limit(10)
)

Hybrid (filter + ANN)

stmt = (
    select(Document)
    .where(Document.user_id == user_id)
    .order_by(Document.embedding.cosine_distance(q_emb))
    .limit(10)
)

For best perf: ensure planner uses HNSW. May need SET hnsw.ef_search = 200.

Tuning HNSW

async with session.begin():
    await session.execute(text("SET LOCAL hnsw.ef_search = 200"))
    result = await session.execute(stmt)

Higher ef_search → better recall, slower query.

Index build params:

postgresql_with={"m": 16, "ef_construction": 64}

Cosine similarity (1 - distance)

similarity_col = (1 - Document.embedding.cosine_distance(q_emb)).label("similarity")
stmt = select(Document, similarity_col).order_by(similarity_col.desc()).limit(10)

Threshold filtering

stmt = (
    select(Document)
    .where(Document.embedding.cosine_distance(q_emb) < 0.5)
    .order_by(Document.embedding.cosine_distance(q_emb))
    .limit(10)
)

Only return matches with similarity > 0.5.

Bulk insert with embeddings

records = [
    {"body": doc, "embedding": await embed(doc)}
    for doc in docs
]
await session.execute(insert(Document), records)
await session.commit()

For huge loads: batch + COPY-friendly format.

Half-precision (memory savings)

pgvector 0.7+:

from pgvector.sqlalchemy import HALFVEC

class Document(Base):
    embedding: Mapped[list[float]] = mapped_column(HALFVEC(1024))

4 bytes → 2 bytes per dimension. Recall ~unchanged.

Sparse vectors (pgvector 0.7+)

For TF-IDF-style sparse embeddings:

from pgvector.sqlalchemy import SPARSEVEC

class Document(Base):
    sparse: Mapped[SparseVec] = mapped_column(SPARSEVEC(10000))

Re-embed on change

@event.listens_for(Document, "before_update")
def update_embedding(mapper, conn, target):
    if target.body_changed:
        target.embedding = compute_embedding_sync(target.body)

For expensive embed: async via outbox + worker.

Hybrid keyword + vector (RRF)

from sqlalchemy.dialects.postgresql import insert

# Two separate queries; combine with RRF in app
fts_results = await session.execute(
    select(Document.id).where(Document.search_vec.op("@@")(q_ts)).limit(50)
)
vec_results = await session.execute(
    select(Document.id).order_by(Document.embedding.cosine_distance(q_emb)).limit(50)
)

def rrf(rankings, k=60):
    scores = defaultdict(float)
    for ranking in rankings:
        for rank, item_id in enumerate(ranking):
            scores[item_id] += 1 / (k + rank)
    return sorted(scores.items(), key=lambda x: -x[1])

See Search System Design .

Capacity

Vectors	Recommended
< 10M	pgvector HNSW
10M-100M	pgvector with tuning, or Qdrant/Weaviate
> 100M	Dedicated vector DB

See Embedding Databases 2026 .

Common mistakes

Wrong dimensions (mismatch with embedding model) — runtime error.
No HNSW index — every query is seq scan.
Filter + ANN with low ef_search — poor recall on filtered subsets.
Forgetting to update embedding on content change.

Read this next

If you want my pgvector + bge-m3 RAG starter, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Setup#

Column#

HNSW index (recommended)#

IVFFlat (smaller / faster build)#

Distance operators#

Similarity query#

Hybrid (filter + ANN)#

Tuning HNSW#

Cosine similarity (1 - distance)#

Threshold filtering#

Bulk insert with embeddings#

Half-precision (memory savings)#

Sparse vectors (pgvector 0.7+)#

Re-embed on change#

Hybrid keyword + vector (RRF)#

Capacity#

Common mistakes#

Read this next#