Chapter 4: Postgres full-text search via SQLAlchemy. tsvector, tsquery, GIN, ranking, websearch.

Setup

from sqlalchemy import Computed, Index
from sqlalchemy.dialects.postgresql import TSVECTOR

class Post(Base):
    __tablename__ = "posts"
    
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str]
    body: Mapped[str]
    search_vec: Mapped[str] = mapped_column(
        TSVECTOR,
        Computed(
            "setweight(to_tsvector('english', coalesce(title, '')), 'A') || "
            "setweight(to_tsvector('english', coalesce(body, '')), 'B')",
            persisted=True,
        ),
    )
    
    __table_args__ = (
        Index("ix_posts_search_vec", "search_vec", postgresql_using="gin"),
    )

Computed(persisted=True) = STORED generated column. setweight lets you boost matches in title vs body during ranking.

Querying

from sqlalchemy import func, text

q = "postgres jsonb"

stmt = (
    select(Post)
    .where(Post.search_vec.op("@@")(func.websearch_to_tsquery("english", q)))
    .order_by(func.ts_rank(Post.search_vec, func.websearch_to_tsquery("english", q)).desc())
    .limit(20)
)

@@ is the match operator. websearch_to_tsquery parses Google-like queries ("phrase", or, -not).

Ranking variants

  • ts_rank: word frequency–based.
  • ts_rank_cd: cover density (proximity-aware).
.order_by(func.ts_rank_cd(Post.search_vec, q_ts).desc())

ts_rank_cd usually feels more relevant.

Highlighting

stmt = select(
    Post,
    func.ts_headline("english", Post.body, q_ts, "MaxWords=30, MinWords=10, ShortWord=3, MaxFragments=2").label("highlight")
)

Returns a snippet with matched terms wrapped in <b>...</b> (configurable).

Configuration / language

to_tsvector('english', text)  # English stemming
to_tsvector('simple', text)   # No stemming

For multi-language: store the language column and dispatch:

to_tsvector(language::regconfig, text)

Synonyms / dictionaries

Postgres FTS supports custom dictionaries / thesaurus / synonyms. For most apps: out-of-the-box english is fine.

For domain-specific (medical, legal): build a custom dictionary or use Elasticsearch.

When to use Postgres FTS

  • Up to ~10M docs at moderate query rates.
  • Single-language or simple multi-language.
  • Don’t need typo tolerance or aggressive ranking.

For more: Meilisearch / Typesense / Elasticsearch. See Search System Design .

Trigram fuzzy match

For typo tolerance:

# CREATE EXTENSION pg_trgm

__table_args__ = (
    Index("ix_posts_title_trgm", "title", postgresql_using="gin", postgresql_ops={"title": "gin_trgm_ops"}),
)

# Query
stmt = select(Post).where(Post.title.op("%")(q))  # similarity > threshold
stmt = select(Post).where(Post.title.ilike(f"%{q}%"))  # uses trigram index

pg_trgm enables fast LIKE '%...%' and similarity matching. Pair with FTS for hybrid.

Hybrid (FTS + trigram)

fts_stmt = select(Post.id).where(Post.search_vec.op("@@")(q_ts))
trgm_stmt = select(Post.id).where(Post.title.op("%")(q))

stmt = select(Post).where(Post.id.in_(fts_stmt.union(trgm_stmt)))

Or use Reciprocal Rank Fusion for combined ranking (see hybrid search ).

.where(Post.search_vec.op("@@")(func.phraseto_tsquery("english", "exact phrase")))

Or in websearch_to_tsquery: "exact phrase" syntax.

Faceted

For “filter by category + search”:

stmt = (
    select(Post)
    .where(Post.search_vec.op("@@")(q_ts))
    .where(Post.category == category)
)

Composite index helps:

__table_args__ = (
    Index("ix_posts_category_search", "category", "search_vec", postgresql_using="gin"),
)

Note: GIN multi-column requires extension btree_gin:

CREATE EXTENSION btree_gin;

Update strategies

For large existing tables: backfill the search_vec column in batches; then add the GIN index concurrently.

# Migration
op.add_column("posts", sa.Column("search_vec", postgresql.TSVECTOR))
op.execute("UPDATE posts SET search_vec = setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B')")
op.execute("CREATE INDEX CONCURRENTLY ix_posts_search ON posts USING gin (search_vec)")

For Computed columns added on existing data: PG fills in lazily; might want explicit update.

RUM index (3rd-party)

For ranking-aware GIN: rum extension:

CREATE EXTENSION rum;

Stores ranking metadata in the index. Faster ts_rank queries. Requires explicit config.

Common mistakes

1. tsvector without GIN

Sequential scans every query. Always GIN.

2. Storing tsvector ad-hoc

Inconsistent. Use a generated column.

3. Ignoring coalesce

NULL fields produce NULL tsvector. coalesce(title, '') for safety.

4. websearch with single language

Multi-language needs per-row config.

5. Trying to compete with Elasticsearch

For complex ranking / typo tolerance / faceting at scale: Postgres FTS isn’t the best tool. Match scope to capability.

What’s next

Chapter 5: Locking and advisory locks.

Read this next


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .