Most RAG systems in 2026 still ship vector-only. They lose to hybrid search by 10–25% on retrieval quality. Postgres can do hybrid in one query. This post is the working pattern.
The ingredients
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
document_id BIGINT,
content TEXT NOT NULL,
embedding vector(1536) NOT NULL,
tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);
CREATE INDEX chunks_embedding_hnsw
ON chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX chunks_tsv_gin ON chunks USING GIN (tsv);
Two indexes. One for semantic. One for lexical.
For pgvector tuning .
The hybrid query
WITH vec AS (
SELECT id, 1 - (embedding <=> $1) AS score
FROM chunks
ORDER BY embedding <=> $1
LIMIT 30
),
lex AS (
SELECT id, ts_rank(tsv, websearch_to_tsquery('english', $2)) AS score
FROM chunks
WHERE tsv @@ websearch_to_tsquery('english', $2)
ORDER BY score DESC
LIMIT 30
),
ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY score DESC) AS rank FROM vec
UNION ALL
SELECT id, ROW_NUMBER() OVER (ORDER BY score DESC) AS rank FROM lex
),
fused AS (
SELECT id, SUM(1.0 / (60 + rank)) AS rrf_score
FROM ranked
GROUP BY id
)
SELECT c.id, c.content, f.rrf_score
FROM chunks c
JOIN fused f USING (id)
ORDER BY f.rrf_score DESC
LIMIT 8;
Three CTEs:
- Vector top-30.
- Lexical top-30.
- RRF fusion.
- Final order.
Runs in tens of ms on 1M chunks with the right indexes.
For the build-a-RAG-app pattern.
Why RRF wins
RRF (Reciprocal Rank Fusion) is sum(1 / (k + rank)) for each item across ranked lists. The k=60 constant smooths it.
- Item ranked 1 in vector list, 5 in lexical:
1/61 + 1/65 = 0.0317. - Item ranked 1 only in vector:
1/61 = 0.0164. - The first wins; co-occurrence in both lists matters.
No tuning needed. Robust across domains.
Adding a reranker
Top-30 fused → reranker → top-8:
fused = await db.fetch(HYBRID_QUERY, q_emb, q_text)
candidates = [r['content'] for r in fused]
# Cohere or BGE reranker
reranked = co.rerank(query=q_text, documents=candidates, top_n=8)
return [fused[r.index] for r in reranked.results]
Reranker improves quality 10–25% over fused alone. See Rerankers in RAG .
Filtering
WHERE tenant_id = $3 AND status = 'active' AND ...
Goes in both subqueries. With proper indexes (composite on (tenant_id, status) + the GIN), filtered hybrid stays fast.
For deep pgvector filtering (partial indexes, iterative scan).
Performance numbers
For 10M chunks:
- Vector top-30: ~10ms (HNSW with
ef_search=100). - Lexical top-30: ~10ms (GIN index).
- RRF fusion in CTE: ~1ms.
- Reranker (Cohere top-8): ~80ms.
- Total: ~100ms.
Fast enough for interactive RAG.
Tuning
ef_searchper session for vector recall.default_statistics_targethigher on the chunks table for better plans.work_memhigher (32–64 MB) for the GIN scan.
For Postgres tuning .
When this stops scaling
- >50M chunks: maybe move to a dedicated vector DB.
- Very high QPS (>1k searches/sec): dedicated cluster.
- Multi-region with low latency required: replication latency might bite.
For most apps, Postgres handles billions of chunks with proper partitioning and indexing.
What I’d ship today
- Postgres + pgvector + tsvector for hybrid.
- HNSW with
m=16, ef_construction=64. - GIN with default ops class.
- RRF fusion in a single SQL query.
- Cohere or BGE reranker for the top-8.
- Per-tenant scoping in WHERE.
Production-grade RAG search. No new infra.
Read this next
If you want a complete hybrid-search FastAPI service, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .