Search is its own subsystem. The decisions you make at design time decide whether you can scale or get rewritten in year two. This post is the working playbook.
Engine choice
| Strengths | Best for | |
|---|---|---|
| Postgres FTS | Free, in-DB, no ops | Up to 10M docs, simple needs |
| Elasticsearch | Mature, distributed, aggregations | Logs + search at scale |
| OpenSearch | Apache fork, AWS-friendly | Drop-in OS option |
| Typesense | Fast, simple, dev-friendly | Mid-scale, search-as-a-feature |
| Meilisearch | Typo-tolerant, easy | Mid-scale, B2C |
| Vectorize / pgvector | Vector search | Semantic, AI-powered |
For most SaaS in 2026, Postgres FTS + pgvector covers it. Elasticsearch when you need full-text at log-scale or rich aggregations.
Postgres FTS — surprising scale
ALTER TABLE products ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(name,'')), 'A') ||
setweight(to_tsvector('english', coalesce(description,'')), 'B')
) STORED;
CREATE INDEX products_tsv ON products USING GIN (tsv);
SELECT id, name, ts_rank(tsv, q) AS rank
FROM products, websearch_to_tsquery('english', $1) q
WHERE tsv @@ q
ORDER BY rank DESC LIMIT 20;
Generated column + GIN index + websearch_to_tsquery (handles phrases, operators) gets you a real search experience for free. See PostgreSQL Full-Text Search
.
Indexing pipeline
Source (Postgres) ──CDC──▶ Index queue ──▶ Search engine
The pattern:
- App writes to Postgres.
- CDC (Postgres CDC ) emits changes.
- Worker enriches (computes embeddings, derives fields) and indexes to search engine.
- Search engine handles queries.
Decoupled. Search outage doesn’t block writes; reindex doesn’t disrupt the app.
Hybrid search
Query
↓
├─ BM25 (Elasticsearch / Postgres FTS) → top 30
├─ Vector (pgvector / Vectorize) → top 30
└─ RRF fusion → top 30
↓
Reranker → top 10
↓
Display
See Build a RAG App with pgvector and Rerankers in RAG .
Ranking
Production search ranking has layers:
- Lexical relevance (BM25).
- Semantic relevance (embedding similarity).
- Business signals (recency, popularity, click-through, conversion).
- Personalization (user history, location).
A learned ranker (gradient-boosted trees) combines them. Update offline; serve online.
Multi-tenancy
For per-tenant search:
- Per-tenant index (Elasticsearch) — strong isolation.
- Shared index + tenant filter — works with tenant_id field; filter on every query.
- Per-tenant DB + Postgres FTS — natural isolation.
For SaaS at scale, per-tenant index keeps query budgets bounded.
Common mistakes
1. Reindexing whole corpus on every change
Use CDC + incremental updates. Full reindex weekly.
2. No relevance tuning
Ship search; assume relevance is good. Run evals ; tune.
3. Forgetting analyzers
Stemming, stop words, language matter. Don’t use the default for non-English content.
4. Filtering after fetching
WHERE deleted = false AFTER fetching is wasteful. Filter at the engine.
5. No backup
Search indexes are derivable, but rebuilding from CDC takes time. Snapshot regularly.
Read this next
- PostgreSQL Full-Text Search
- Build a Production RAG App with pgvector
- Rerankers in RAG
- Postgres CDC, Logical Replication, Debezium
If you want my Postgres FTS + pgvector hybrid search starter, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .