Most RAG systems in 2026 still ship vector-only. They lose to hybrid search by 10–25% on retrieval quality. Postgres can do hybrid in one query. This post is the working pattern.

The ingredients

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks (
    id BIGSERIAL PRIMARY KEY,
    document_id BIGINT,
    content TEXT NOT NULL,
    embedding vector(1536) NOT NULL,
    tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);

CREATE INDEX chunks_embedding_hnsw
  ON chunks USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

CREATE INDEX chunks_tsv_gin ON chunks USING GIN (tsv);

Two indexes. One for semantic. One for lexical.

For pgvector tuning .

The hybrid query

WITH vec AS (
    SELECT id, 1 - (embedding <=> $1) AS score
    FROM chunks
    ORDER BY embedding <=> $1
    LIMIT 30
),
lex AS (
    SELECT id, ts_rank(tsv, websearch_to_tsquery('english', $2)) AS score
    FROM chunks
    WHERE tsv @@ websearch_to_tsquery('english', $2)
    ORDER BY score DESC
    LIMIT 30
),
ranked AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY score DESC) AS rank FROM vec
    UNION ALL
    SELECT id, ROW_NUMBER() OVER (ORDER BY score DESC) AS rank FROM lex
),
fused AS (
    SELECT id, SUM(1.0 / (60 + rank)) AS rrf_score
    FROM ranked
    GROUP BY id
)
SELECT c.id, c.content, f.rrf_score
FROM chunks c
JOIN fused f USING (id)
ORDER BY f.rrf_score DESC
LIMIT 8;

Three CTEs:

  1. Vector top-30.
  2. Lexical top-30.
  3. RRF fusion.
  4. Final order.

Runs in tens of ms on 1M chunks with the right indexes.

For the build-a-RAG-app pattern.

Why RRF wins

RRF (Reciprocal Rank Fusion) is sum(1 / (k + rank)) for each item across ranked lists. The k=60 constant smooths it.

  • Item ranked 1 in vector list, 5 in lexical: 1/61 + 1/65 = 0.0317.
  • Item ranked 1 only in vector: 1/61 = 0.0164.
  • The first wins; co-occurrence in both lists matters.

No tuning needed. Robust across domains.

Adding a reranker

Top-30 fused → reranker → top-8:

fused = await db.fetch(HYBRID_QUERY, q_emb, q_text)
candidates = [r['content'] for r in fused]

# Cohere or BGE reranker
reranked = co.rerank(query=q_text, documents=candidates, top_n=8)

return [fused[r.index] for r in reranked.results]

Reranker improves quality 10–25% over fused alone. See Rerankers in RAG .

Filtering

WHERE tenant_id = $3 AND status = 'active' AND ...

Goes in both subqueries. With proper indexes (composite on (tenant_id, status) + the GIN), filtered hybrid stays fast.

For deep pgvector filtering (partial indexes, iterative scan).

Performance numbers

For 10M chunks:

  • Vector top-30: ~10ms (HNSW with ef_search=100).
  • Lexical top-30: ~10ms (GIN index).
  • RRF fusion in CTE: ~1ms.
  • Reranker (Cohere top-8): ~80ms.
  • Total: ~100ms.

Fast enough for interactive RAG.

Tuning

  • ef_search per session for vector recall.
  • default_statistics_target higher on the chunks table for better plans.
  • work_mem higher (32–64 MB) for the GIN scan.

For Postgres tuning .

When this stops scaling

  • >50M chunks: maybe move to a dedicated vector DB.
  • Very high QPS (>1k searches/sec): dedicated cluster.
  • Multi-region with low latency required: replication latency might bite.

For most apps, Postgres handles billions of chunks with proper partitioning and indexing.

What I’d ship today

  • Postgres + pgvector + tsvector for hybrid.
  • HNSW with m=16, ef_construction=64.
  • GIN with default ops class.
  • RRF fusion in a single SQL query.
  • Cohere or BGE reranker for the top-8.
  • Per-tenant scoping in WHERE.

Production-grade RAG search. No new infra.

Read this next

If you want a complete hybrid-search FastAPI service, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .