SQLAlchemy + Postgres Cheatsheet 14 — pgvector for Embeddings

Cheatsheet for pgvector via SQLAlchemy.

Setup

CREATE EXTENSION vector;

uv add pgvector

Column

from pgvector.sqlalchemy import Vector

class Doc(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    body: Mapped[str]
    embedding: Mapped[list[float]] = mapped_column(Vector(1024))

Dimension fixed at column creation.

Insert

doc = Doc(body="...", embedding=embed_function(text))   # list[float] of length 1024
session.add(doc)
await session.commit()

Distance queries

# Cosine distance (most common for normalized embeddings)
stmt = (
    select(Doc, Doc.embedding.cosine_distance(q_emb).label("dist"))
    .order_by("dist")
    .limit(10)
)

# L2 (Euclidean)
stmt = select(Doc).order_by(Doc.embedding.l2_distance(q_emb)).limit(10)

# Inner product (for unnormalized)
stmt = select(Doc).order_by(Doc.embedding.max_inner_product(q_emb)).limit(10)

Operators in raw SQL: <=> cosine, <-> L2, <#> negative inner product.

HNSW index (production default in 2026)

from sqlalchemy import Index

__table_args__ = (
    Index(
        "ix_docs_embedding_hnsw",
        "embedding",
        postgresql_using="hnsw",
        postgresql_ops={"embedding": "vector_cosine_ops"},
        postgresql_with={"m": 16, "ef_construction": 64},
    ),
)

Operator class must match query distance:

vector_cosine_ops for <=>
vector_l2_ops for <->
vector_ip_ops for <#>

IVFFlat index

__table_args__ = (
    Index(
        "ix_docs_embedding_ivfflat",
        "embedding",
        postgresql_using="ivfflat",
        postgresql_ops={"embedding": "vector_cosine_ops"},
        postgresql_with={"lists": 100},   # sqrt(N) rule of thumb
    ),
)

Faster to build; slightly lower recall than HNSW. Build AFTER inserting some data (needs sample).

Tune at query time

# HNSW: increase recall at cost of latency
await session.execute(text("SET LOCAL hnsw.ef_search = 100"))

# IVFFlat
await session.execute(text("SET LOCAL ivfflat.probes = 10"))

Hybrid (filter + ANN)

stmt = (
    select(Doc)
    .where(Doc.user_id == user_id, Doc.deleted_at == None)
    .order_by(Doc.embedding.cosine_distance(q_emb))
    .limit(10)
)

Filter then ANN. For HNSW with strict filter: relies on Postgres planner to push filter; sometimes slow at high cardinality.

For best perf with restrictive filter: combine with B-tree on the filter column, or partition the table.

Similarity threshold

threshold = 0.3
stmt = (
    select(Doc, Doc.embedding.cosine_distance(q_emb).label("d"))
    .where(Doc.embedding.cosine_distance(q_emb) < threshold)
    .order_by("d")
    .limit(10)
)

Half-precision (smaller storage)

pgvector 0.7+ supports halfvec:

from pgvector.sqlalchemy import HALFVEC

class Doc(Base):
    embedding: Mapped[list[float]] = mapped_column(HALFVEC(1024))

50% storage; small accuracy hit. Use when memory matters.

Sparse vectors

from pgvector.sqlalchemy import SPARSEVEC

# pgvector 0.7+ supports sparse vectors

For ColBERT-style or keyword-augmented embeddings.

Storage size

1024-dim float32: 4 KB per row.
1M rows: 4 GB just for embeddings.
Half-precision: 2 KB per row.

For really large indexes: HNSW build memory matters.

Re-embedding on text change

@event.listens_for(Doc, "before_update")
def re_embed(mapper, connection, target):
    if target.body_changed:
        target.embedding = embed_sync(target.body)

Or async via outbox + worker for slow embedding calls.

Multi-vector (multiple embeddings per row)

class Doc(Base):
    title_embedding: Mapped[list[float]] = mapped_column(Vector(1024))
    body_embedding: Mapped[list[float]] = mapped_column(Vector(1024))

Two indexes. Combine in queries.

Common mistakes

HNSW index with wrong opclass (e.g., vector_cosine_ops) but querying with L2 — index unused.
Filter pushdown failing with high-cardinality WHERE — slow.
Storing huge vectors (>4096-dim) — IVFFlat / HNSW limits.
Forgetting to set ef_search for higher recall.

Read this next

If you want my pgvector + RAG starter, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Setup#

Column#

Insert#

Distance queries#

HNSW index (production default in 2026)#

IVFFlat index#

Tune at query time#

Hybrid (filter + ANN)#

Similarity threshold#

Half-precision (smaller storage)#

Sparse vectors#

Storage size#

Re-embedding on text change#

Multi-vector (multiple embeddings per row)#

Common mistakes#

Read this next#