RAG | Manvendra Rajpoot

AI/LLM Cheatsheet 05 — RAG Patterns

Cheatsheet: RAG pipeline, chunking, retrieval, rerank, citations.

Evaluating RAG Systems in 2026 — Retrieval Quality, Faithfulness, and the Metrics That Matter

How to actually evaluate RAG: retrieval recall and MRR, answer faithfulness and relevance, golden datasets, automated eval pipelines, and Ragas.

Hybrid Search with Postgres in 2026 — pgvector + FTS + Reranker

Production hybrid search with Postgres alone: pgvector for semantic, tsvector for lexical, RRF fusion for combining, optional reranker. Performance, tuning, and patterns.

Giving AI Agents Memory in 2026 — Mem0, Zep, and the Patterns That Work

Why agents need memory beyond the context window, the 2026 tools (Mem0, Zep, custom layers), summary vs episodic memory, retrieval, and the patterns from production agents.

1M-Token Context Windows in 2026 — When They Help, When They Hurt

How to actually use 1M-token context windows. The ‘just put it all in context’ temptation, when it works, when RAG still wins, prompt caching, and cost.

Agentic RAG in 2026 — When Retrieval Becomes a Tool, Not a Pipeline

Why agentic RAG often beats one-shot RAG. Tool-based retrieval, decomposition, query rewriting, self-reflection, citations, and the production patterns that ship in 2026.

Rerankers in RAG — The Underrated Quality Multiplier in 2026

Rerankers turn ‘pretty good RAG’ into ‘great RAG’ for one extra API call. Cross-encoders explained, Cohere Rerank vs BGE-Reranker vs Jina, two-stage retrieval architecture, latency, cost, and implementation.

Embedding Models in 2026 — OpenAI, Voyage, Cohere, BGE, and How to Pick

A practical 2026 guide to picking an embedding model. OpenAI text-embedding-3 vs Voyage vs Cohere vs open BGE / Nomic. Quality on MTEB, cost, dimensions, multilingual, and how to evaluate on your own data.

Fine-Tuning vs RAG vs Prompting in 2026 — How to Pick the Right Approach

A practical 2026 decision guide for LLM teams. When fine-tuning earns its cost, when RAG is right, when prompting is enough, the hybrid patterns, and the ops realities that change which one fits.

Build a Production RAG App with pgvector and FastAPI in 2026

A complete, end-to-end RAG backend built on PostgreSQL + pgvector and FastAPI. Real chunking, real embeddings, hybrid (vector + BM25) retrieval, prompt assembly, citations, and production gotchas.