AI/LLM Cheatsheet 05 — RAG Patterns

Cheatsheet: RAG pipeline, chunking, retrieval, rerank, citations.

May 26, 2026 · 4 min · 704 words · Manvendra Rajpoot

Evaluating RAG Systems in 2026 — Retrieval Quality, Faithfulness, and the Metrics That Matter

How to actually evaluate RAG: retrieval recall and MRR, answer faithfulness and relevance, golden datasets, automated eval pipelines, and Ragas.

May 2, 2026 · 4 min · 811 words · Manvendra Rajpoot

Hybrid Search with Postgres in 2026 — pgvector + FTS + Reranker

Production hybrid search with Postgres alone: pgvector for semantic, tsvector for lexical, RRF fusion for combining, optional reranker. Performance, tuning, and patterns.

May 1, 2026 · 3 min · 601 words · Manvendra Rajpoot

Giving AI Agents Memory in 2026 — Mem0, Zep, and the Patterns That Work

Why agents need memory beyond the context window, the 2026 tools (Mem0, Zep, custom layers), summary vs episodic memory, retrieval, and the patterns from production agents.

April 30, 2026 · 5 min · 1005 words · Manvendra Rajpoot

1M-Token Context Windows in 2026 — When They Help, When They Hurt

How to actually use 1M-token context windows. The ‘just put it all in context’ temptation, when it works, when RAG still wins, prompt caching, and cost.

April 30, 2026 · 3 min · 541 words · Manvendra Rajpoot

Agentic RAG in 2026 — When Retrieval Becomes a Tool, Not a Pipeline

Why agentic RAG often beats one-shot RAG. Tool-based retrieval, decomposition, query rewriting, self-reflection, citations, and the production patterns that ship in 2026.

April 30, 2026 · 3 min · 524 words · Manvendra Rajpoot

Rerankers in RAG — The Underrated Quality Multiplier in 2026

Rerankers turn ‘pretty good RAG’ into ‘great RAG’ for one extra API call. Cross-encoders explained, Cohere Rerank vs BGE-Reranker vs Jina, two-stage retrieval architecture, latency, cost, and implementation.

April 30, 2026 · 5 min · 914 words · Manvendra Rajpoot

Embedding Models in 2026 — OpenAI, Voyage, Cohere, BGE, and How to Pick

A practical 2026 guide to picking an embedding model. OpenAI text-embedding-3 vs Voyage vs Cohere vs open BGE / Nomic. Quality on MTEB, cost, dimensions, multilingual, and how to evaluate on your own data.

April 30, 2026 · 4 min · 829 words · Manvendra Rajpoot

Fine-Tuning vs RAG vs Prompting in 2026 — How to Pick the Right Approach

A practical 2026 decision guide for LLM teams. When fine-tuning earns its cost, when RAG is right, when prompting is enough, the hybrid patterns, and the ops realities that change which one fits.

April 29, 2026 · 7 min · 1482 words · Manvendra Rajpoot

Build a Production RAG App with pgvector and FastAPI in 2026

A complete, end-to-end RAG backend built on PostgreSQL + pgvector and FastAPI. Real chunking, real embeddings, hybrid (vector + BM25) retrieval, prompt assembly, citations, and production gotchas.

April 28, 2026 · 8 min · 1679 words · Manvendra Rajpoot