Search is its own subsystem. The decisions you make at design time decide whether you can scale or get rewritten in year two. This post is the working playbook.

Engine choice

StrengthsBest for
Postgres FTSFree, in-DB, no opsUp to 10M docs, simple needs
ElasticsearchMature, distributed, aggregationsLogs + search at scale
OpenSearchApache fork, AWS-friendlyDrop-in OS option
TypesenseFast, simple, dev-friendlyMid-scale, search-as-a-feature
MeilisearchTypo-tolerant, easyMid-scale, B2C
Vectorize / pgvectorVector searchSemantic, AI-powered

For most SaaS in 2026, Postgres FTS + pgvector covers it. Elasticsearch when you need full-text at log-scale or rich aggregations.

Postgres FTS — surprising scale

ALTER TABLE products ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name,'')), 'A') ||
    setweight(to_tsvector('english', coalesce(description,'')), 'B')
  ) STORED;

CREATE INDEX products_tsv ON products USING GIN (tsv);

SELECT id, name, ts_rank(tsv, q) AS rank
FROM products, websearch_to_tsquery('english', $1) q
WHERE tsv @@ q
ORDER BY rank DESC LIMIT 20;

Generated column + GIN index + websearch_to_tsquery (handles phrases, operators) gets you a real search experience for free. See PostgreSQL Full-Text Search .

Indexing pipeline

Source (Postgres) ──CDC──▶ Index queue ──▶ Search engine

The pattern:

  1. App writes to Postgres.
  2. CDC (Postgres CDC ) emits changes.
  3. Worker enriches (computes embeddings, derives fields) and indexes to search engine.
  4. Search engine handles queries.

Decoupled. Search outage doesn’t block writes; reindex doesn’t disrupt the app.

Query
  ├─ BM25 (Elasticsearch / Postgres FTS) → top 30
  ├─ Vector (pgvector / Vectorize) → top 30
  └─ RRF fusion → top 30
                  Reranker → top 10
                Display

See Build a RAG App with pgvector and Rerankers in RAG .

Ranking

Production search ranking has layers:

  1. Lexical relevance (BM25).
  2. Semantic relevance (embedding similarity).
  3. Business signals (recency, popularity, click-through, conversion).
  4. Personalization (user history, location).

A learned ranker (gradient-boosted trees) combines them. Update offline; serve online.

Multi-tenancy

For per-tenant search:

  • Per-tenant index (Elasticsearch) — strong isolation.
  • Shared index + tenant filter — works with tenant_id field; filter on every query.
  • Per-tenant DB + Postgres FTS — natural isolation.

For SaaS at scale, per-tenant index keeps query budgets bounded.

Common mistakes

1. Reindexing whole corpus on every change

Use CDC + incremental updates. Full reindex weekly.

2. No relevance tuning

Ship search; assume relevance is good. Run evals ; tune.

3. Forgetting analyzers

Stemming, stop words, language matter. Don’t use the default for non-English content.

4. Filtering after fetching

WHERE deleted = false AFTER fetching is wasteful. Filter at the engine.

5. No backup

Search indexes are derivable, but rebuilding from CDC takes time. Snapshot regularly.

Read this next

If you want my Postgres FTS + pgvector hybrid search starter, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .