AI Engineering on Manvendra Rajpoot

AI Engineering on Manvendra Rajpoot https://blog.rajpoot.dev/posts/ai/ Recent content in AI Engineering on Manvendra Rajpoot Manvendra Rajpoot https://blog.rajpoot.dev/img/personal/cover.png https://blog.rajpoot.dev/img/personal/cover.png Hugo en Manvendra Rajpoot Self-Hosting LLMs in 2026 — When the Math Actually Works https://blog.rajpoot.dev/posts/ai/llm-self-host-economics-2026/ Tue, 05 May 2026 08:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-self-host-economics-2026/ Self-hosting LLMs in 2026 — vLLM, GPU economics, break-even, and when self-host beats API. Anthropic API Best Practices in 2026 — Caching, Tool Use, Streaming, and Production Patterns https://blog.rajpoot.dev/posts/ai/anthropic-api-best-practices-2026/ Tue, 05 May 2026 07:50:00 +0530 https://blog.rajpoot.dev/posts/ai/anthropic-api-best-practices-2026/ Anthropic API best practices in 2026 — prompt caching, tool use, streaming, batch API, and production patterns from real Claude apps. Evaluating AI Coding Tools in 2026 — Benchmarks That Matter and Ones That Don't https://blog.rajpoot.dev/posts/ai/ai-coding-evals-2026/ Tue, 05 May 2026 06:20:00 +0530 https://blog.rajpoot.dev/posts/ai/ai-coding-evals-2026/ Evaluating AI coding tools in 2026 — SWE-bench, real-world tasks, and what's actually predictive of productivity gains. Synthetic Data with LLMs in 2026 — Use Cases, Risks, and the Patterns That Work https://blog.rajpoot.dev/posts/ai/synthetic-data-2026/ Tue, 05 May 2026 06:10:00 +0530 https://blog.rajpoot.dev/posts/ai/synthetic-data-2026/ Synthetic data generation with LLMs in 2026 — when it helps, model collapse risk, eval set generation, and production patterns. Voice Agents in 2026 — STT, LLM, TTS, and Latency That Doesn't Hurt https://blog.rajpoot.dev/posts/ai/voice-agents-2026/ Tue, 05 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/voice-agents-2026/ Building voice AI agents in 2026 — streaming STT, LLM, TTS pipelines, latency budgets, interruption, and real-world architectures. Model Context Protocol (MCP) in 2026 — What It Solved, What It Didn't https://blog.rajpoot.dev/posts/ai/llm-mcp-protocol-2026/ Mon, 04 May 2026 06:10:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-mcp-protocol-2026/ MCP in 2026 — protocol overview, server / client patterns, ecosystem, and an honest take on where MCP fits in agent infrastructure. LLM Tool Use Patterns in 2026 — Schemas, Validation, and the Loop https://blog.rajpoot.dev/posts/ai/llm-tool-use-patterns-2026/ Mon, 04 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-tool-use-patterns-2026/ LLM tool use in 2026 — designing tool schemas, parallel calls, error handling, and the patterns from production agents. Agentic Coding in 2026 — Claude Code, Cursor, and the Real Workflow https://blog.rajpoot.dev/posts/ai/agentic-coding-2026/ Sun, 03 May 2026 08:50:00 +0530 https://blog.rajpoot.dev/posts/ai/agentic-coding-2026/ Agentic coding in 2026 — Claude Code, Cursor, Aider, and how AI coding agents actually fit into senior engineers' workflows. LLM Batch Processing in 2026 — Anthropic / OpenAI Batch API for 50% Off https://blog.rajpoot.dev/posts/ai/llm-batch-processing-2026/ Sun, 03 May 2026 06:20:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-batch-processing-2026/ LLM batch APIs in 2026 — Anthropic, OpenAI, Bedrock batch processing for 50% discount, when to use them, and the patterns that work. LLM Deployment Patterns in 2026 — Inference Servers, Routing, and Production Architectures https://blog.rajpoot.dev/posts/ai/llm-deployment-patterns-2026/ Sun, 03 May 2026 06:10:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-deployment-patterns-2026/ LLM deployment patterns in 2026 — vLLM, TGI, Ollama, hybrid API+self-hosted, routing layers, and the production architectures that actually work. Prompt Engineering in 2026 — What Still Works, What Doesn't, and What Changed https://blog.rajpoot.dev/posts/ai/llm-prompt-engineering-2026/ Sun, 03 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-prompt-engineering-2026/ Prompt engineering in 2026 — patterns that still work, what's been obsoleted by better models, structured prompts, and production discipline. LLM Agent Frameworks in 2026 — LangGraph, CrewAI, and the Bare-Metal Alternative https://blog.rajpoot.dev/posts/ai/llm-agent-frameworks-2026/ Sat, 02 May 2026 12:50:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-agent-frameworks-2026/ LLM agent frameworks in 2026 — LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, and when bare-metal is better. Agent Memory Systems in 2026 — Episodic, Semantic, and the Patterns That Stick https://blog.rajpoot.dev/posts/ai/agent-memory-systems-2026/ Sat, 02 May 2026 11:10:00 +0530 https://blog.rajpoot.dev/posts/ai/agent-memory-systems-2026/ Agent memory systems in 2026 — episodic vs semantic memory, vector stores, working memory, and patterns from production agents. LLM Context Windows in 2026 — Long Context, Cache, and the Limits of 'Just Add More' https://blog.rajpoot.dev/posts/ai/llm-context-windows-2026/ Sat, 02 May 2026 11:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-context-windows-2026/ LLM context windows in 2026 — what 200k / 1M context can and can't do, prompt caching, retrieval, and patterns from production. Multimodal LLMs in 2026 — Vision, Audio, and What's Actually Useful https://blog.rajpoot.dev/posts/ai/multimodal-llms-2026/ Sat, 02 May 2026 09:30:00 +0530 https://blog.rajpoot.dev/posts/ai/multimodal-llms-2026/ Multimodal LLMs in 2026 — vision input, audio input, generation, real-world use cases, and the patterns that work in production. Evaluating RAG Systems in 2026 — Retrieval Quality, Faithfulness, and the Metrics That Matter https://blog.rajpoot.dev/posts/ai/llm-rag-evaluation-2026/ Sat, 02 May 2026 09:20:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-rag-evaluation-2026/ RAG evaluation in 2026 — retrieval metrics (recall, MRR), generation metrics (faithfulness, relevance), Ragas, and the patterns from production RAG. LLM Observability in 2026 — Tracing, Evals, and the Things You Can't Skip https://blog.rajpoot.dev/posts/ai/llm-observability-2026/ Sat, 02 May 2026 07:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-observability-2026/ Production LLM observability in 2026 — distributed tracing, eval pipelines, Langfuse, Arize, and the patterns that turn black-box LLMs into operable systems. LLM Cost Optimization in 2026 — From Bills That Hurt to Bills That Don't https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-2026/ Sat, 02 May 2026 07:20:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-2026/ Cutting LLM costs in 2026 — prompt caching, routing, batching, fine-tunes, and the patterns that drop bills 5-20× without quality loss. LLM Guardrails in 2026 — Input Filtering, Output Validation, and Safety Nets https://blog.rajpoot.dev/posts/ai/llm-guardrails-content-safety-2026/ Fri, 01 May 2026 07:10:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-guardrails-content-safety-2026/ Practical LLM guardrails in 2026 — input filtering, output validation, NVIDIA NeMo, Guardrails AI, and the patterns that prevent embarrassments. Embedding Databases in 2026 — pgvector, Qdrant, Weaviate, Milvus, Pinecone https://blog.rajpoot.dev/posts/ai/embedding-databases-2026/ Fri, 01 May 2026 07:00:00 +0530 https://blog.rajpoot.dev/posts/ai/embedding-databases-2026/ Embedding databases compared in 2026 — pgvector, Qdrant, Weaviate, Milvus, Pinecone, Vectorize. When each fits. Fine-Tuning LLMs in 2026 — LoRA, QLoRA, and the Cheap Path to Specialized Models https://blog.rajpoot.dev/posts/ai/llm-fine-tuning-lora-qlora-2026/ Fri, 01 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-fine-tuning-lora-qlora-2026/ Practical LLM fine-tuning in 2026 — LoRA, QLoRA, training data prep, evaluation, and the patterns from teams shipping fine-tuned models. LLM Agent Error Recovery in 2026 — Patterns That Don't Loop Forever https://blog.rajpoot.dev/posts/ai/llm-agent-error-recovery-2026/ Fri, 01 May 2026 04:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-agent-error-recovery-2026/ How to build LLM agents that recover from errors gracefully — retry policies, fallback paths, max-step caps, and the patterns that prevent runaway loops. OpenAI vs Anthropic vs Google for Production AI in 2026 https://blog.rajpoot.dev/posts/ai/openai-vs-anthropic-vs-google-2026/ Fri, 01 May 2026 03:00:00 +0530 https://blog.rajpoot.dev/posts/ai/openai-vs-anthropic-vs-google-2026/ Honest comparison of OpenAI vs Anthropic vs Google for production LLM apps in 2026 — model quality, pricing, latency, ecosystem, and how to pick. Document AI in 2026 — Extracting Structured Data from PDFs and Images https://blog.rajpoot.dev/posts/ai/document-ai-pdf-extraction-2026/ Fri, 01 May 2026 01:50:00 +0530 https://blog.rajpoot.dev/posts/ai/document-ai-pdf-extraction-2026/ How to extract structured data from PDFs and images in 2026 — vision LLMs, OCR pipelines, layout-aware models, and the patterns that ship. LLM Prompt Caching Deep Dive — Anthropic, OpenAI, and the Patterns That Save 90% https://blog.rajpoot.dev/posts/ai/llm-prompt-caching-deep-dive-2026/ Fri, 01 May 2026 00:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-prompt-caching-deep-dive-2026/ Prompt caching mechanics in 2026 — Anthropic's ephemeral cache, OpenAI's automatic caching, breakpoint placement, hit-rate measurement, and the patterns that save real money. LLM Evaluation Frameworks in 2026 — Braintrust, LangSmith, Ragas, DeepEval https://blog.rajpoot.dev/posts/ai/llm-evaluation-frameworks-2026/ Thu, 30 Apr 2026 23:59:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-evaluation-frameworks-2026/ Picking an LLM evaluation framework in 2026 — Braintrust vs LangSmith vs Ragas vs DeepEval. What each does, when each fits. Designing Tools for AI Agents in 2026 — The Patterns That Work https://blog.rajpoot.dev/posts/ai/agent-tool-design-patterns-2026/ Thu, 30 Apr 2026 22:40:00 +0530 https://blog.rajpoot.dev/posts/ai/agent-tool-design-patterns-2026/ How to design tools that AI agents use correctly — naming, descriptions, schemas, error returns, and the patterns from production agent systems. Context Engineering for LLMs in 2026 — The Discipline Beyond Prompting https://blog.rajpoot.dev/posts/ai/llm-context-engineering-patterns-2026/ Thu, 30 Apr 2026 21:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-context-engineering-patterns-2026/ Context engineering for LLMs in 2026 — what to put in context, what to leave out, ordering, compression, and the patterns that make agents work. LLM Streaming with Cancellation — Patterns That Don't Waste Tokens https://blog.rajpoot.dev/posts/ai/llm-streaming-cancellation-patterns-2026/ Thu, 30 Apr 2026 20:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-streaming-cancellation-patterns-2026/ How to implement LLM streaming with proper cancellation in 2026 — SSE patterns, abort signals, server-side cancel, and not paying for tokens the user doesn't want. LLM Routing in 2026 — Use Haiku to Save 80% on Sonnet/Opus Bills https://blog.rajpoot.dev/posts/ai/llm-routing-classification-haiku-2026/ Thu, 30 Apr 2026 19:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-routing-classification-haiku-2026/ How LLM routing with a small classifier (Haiku) saves 80% on Sonnet / Opus / GPT-5 bills in 2026 — patterns, accuracy, and how to wire it in. Giving AI Agents Memory in 2026 — Mem0, Zep, and the Patterns That Work https://blog.rajpoot.dev/posts/ai/agents-with-memory-mem0-zep-2026/ Thu, 30 Apr 2026 13:40:00 +0530 https://blog.rajpoot.dev/posts/ai/agents-with-memory-mem0-zep-2026/ How to give AI agents long-term memory in 2026 — Mem0, Zep, hand-rolled memory layers, summary memory, and the architecture that scales. Sandboxed Code Execution for AI Agents — E2B, Modal, Daytona, and the 2026 Stack https://blog.rajpoot.dev/posts/ai/sandboxed-code-execution-agents-e2b-modal-2026/ Thu, 30 Apr 2026 13:30:00 +0530 https://blog.rajpoot.dev/posts/ai/sandboxed-code-execution-agents-e2b-modal-2026/ How AI agents run code safely in 2026 — E2B, Modal, Daytona, microVMs, and the patterns for sandboxed execution with internet access. AI Coding Assistants ROI in 2026 — The Honest Numbers https://blog.rajpoot.dev/posts/ai/ai-coding-assistants-cost-roi-2026/ Thu, 30 Apr 2026 13:20:00 +0530 https://blog.rajpoot.dev/posts/ai/ai-coding-assistants-cost-roi-2026/ Honest ROI numbers for AI coding assistants in 2026 — productivity gains, where they actually help, where they hurt, and the patterns of high-leverage adoption. 1M-Token Context Windows in 2026 — When They Help, When They Hurt https://blog.rajpoot.dev/posts/ai/long-context-1m-tokens-2026/ Thu, 30 Apr 2026 13:10:00 +0530 https://blog.rajpoot.dev/posts/ai/long-context-1m-tokens-2026/ Practical guide to 1M-token context windows in 2026 — when long context replaces RAG, when it doesn't, prompt caching, and the cost reality. Agentic RAG in 2026 — When Retrieval Becomes a Tool, Not a Pipeline https://blog.rajpoot.dev/posts/ai/agentic-rag-2026/ Thu, 30 Apr 2026 13:00:00 +0530 https://blog.rajpoot.dev/posts/ai/agentic-rag-2026/ Agentic RAG explained — when the agent decides what and when to retrieve, multi-step reasoning, query rewriting, self-reflection, and the patterns that beat naive RAG. LLM Security in 2026 — Prompt Injection, Data Exfiltration, and Defense in Depth https://blog.rajpoot.dev/posts/ai/llm-security-prompt-injection-2026/ Thu, 30 Apr 2026 12:50:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-security-prompt-injection-2026/ How to defend against LLM-specific attacks in 2026 — prompt injection, indirect injection, data exfiltration, jailbreaks, and the layered defenses that work. LLM Observability in 2026 — LangSmith, Langfuse, Helicone, and OpenTelemetry https://blog.rajpoot.dev/posts/ai/llm-observability-tracing-langsmith-2026/ Thu, 30 Apr 2026 12:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-observability-tracing-langsmith-2026/ How to observe production LLM apps in 2026 — LangSmith, Langfuse, Helicone, OpenTelemetry GenAI semantic conventions, and the metrics that matter. Rerankers in RAG — The Underrated Quality Multiplier in 2026 https://blog.rajpoot.dev/posts/ai/rerankers-rag-quality-2026/ Thu, 30 Apr 2026 12:30:00 +0530 https://blog.rajpoot.dev/posts/ai/rerankers-rag-quality-2026/ Why rerankers are the highest-ROI upgrade to a RAG system in 2026 — Cohere Rerank, BGE-Reranker, JinaAI, cross-encoders, and how to wire one into a production pipeline. Embedding Models in 2026 — OpenAI, Voyage, Cohere, BGE, and How to Pick https://blog.rajpoot.dev/posts/ai/embeddings-models-comparison-2026/ Thu, 30 Apr 2026 12:20:00 +0530 https://blog.rajpoot.dev/posts/ai/embeddings-models-comparison-2026/ How to pick an embedding model in 2026 — OpenAI text-embedding-3, Voyage, Cohere, BGE, and the open-source landscape. Quality, cost, dimensions, multilingual support. Voice Agents and Realtime LLM APIs in 2026 — How They Actually Work https://blog.rajpoot.dev/posts/ai/voice-agents-realtime-llm-2026/ Thu, 30 Apr 2026 12:10:00 +0530 https://blog.rajpoot.dev/posts/ai/voice-agents-realtime-llm-2026/ How voice agents work in 2026 — Realtime APIs from OpenAI / Anthropic / Google, latency budgets, ASR, TTS, interruption handling, and production architecture. LLM Cost Optimization in 2026 — Tactics That Cut Bills 50–90% https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-tactics-2026/ Thu, 30 Apr 2026 12:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-tactics-2026/ Concrete LLM cost optimization tactics that cut your Anthropic / OpenAI / Gemini bill by 50–90% — caching, model routing, batching, fine-tuning, and the patterns that compound. Building an MCP Server for Your SaaS — A 2026 Distribution Strategy https://blog.rajpoot.dev/posts/ai/build-mcp-server-saas-2026/ Thu, 30 Apr 2026 09:20:00 +0530 https://blog.rajpoot.dev/posts/ai/build-mcp-server-saas-2026/ Why every SaaS needs an MCP server in 2026 — the distribution play, what to expose as tools, OAuth patterns, and a working TypeScript example. Structured Output for LLMs in 2026 — Pydantic AI, Instructor, and the End of JSON Parsing https://blog.rajpoot.dev/posts/ai/structured-output-pydantic-ai-instructor-2026/ Thu, 30 Apr 2026 09:10:00 +0530 https://blog.rajpoot.dev/posts/ai/structured-output-pydantic-ai-instructor-2026/ How to get structured, validated output from LLMs in 2026 — Pydantic AI, Instructor, native tool-calling, OpenAI's structured outputs API, and the patterns that make extraction reliable. AI Gateways in 2026 — LiteLLM, Portkey, Helicone, and the OpenAI Façade https://blog.rajpoot.dev/posts/ai/ai-gateways-litellm-portkey-helicone-2026/ Thu, 30 Apr 2026 09:00:00 +0530 https://blog.rajpoot.dev/posts/ai/ai-gateways-litellm-portkey-helicone-2026/ AI gateways explained — why every serious LLM app needs one in 2026, comparison of LiteLLM, Portkey, Helicone, and OpenRouter, and how to add one without rewriting your code. Multi-Agent Systems in 2026 — Production Patterns That Work https://blog.rajpoot.dev/posts/ai/multi-agent-systems-production-patterns-2026/ Thu, 30 Apr 2026 08:50:00 +0530 https://blog.rajpoot.dev/posts/ai/multi-agent-systems-production-patterns-2026/ Multi-agent systems explained — supervisor / worker, writer / reviewer, hierarchical and swarm patterns, and the production gotchas in 2026. Cursor vs Windsurf vs Claude Code in 2026 — An Honest Comparison https://blog.rajpoot.dev/posts/ai/cursor-vs-windsurf-vs-claude-code-2026/ Thu, 30 Apr 2026 08:00:00 +0530 https://blog.rajpoot.dev/posts/ai/cursor-vs-windsurf-vs-claude-code-2026/ Cursor vs Windsurf vs Claude Code in 2026 — pricing, agentic features, context windows, multi-file editing, and which tool fits which workflow. Fine-Tuning vs RAG vs Prompting in 2026 — How to Pick the Right Approach https://blog.rajpoot.dev/posts/ai/fine-tuning-vs-rag-vs-prompting-2026/ Wed, 29 Apr 2026 10:00:00 +0530 https://blog.rajpoot.dev/posts/ai/fine-tuning-vs-rag-vs-prompting-2026/ When to fine-tune, when to RAG, and when to just prompt — a practical 2026 decision guide for LLM applications, with cost, quality, and ops tradeoffs. Claude Code Skills and Agentic Coding Patterns in 2026 https://blog.rajpoot.dev/posts/ai/claude-code-skills-agentic-coding-2026/ Wed, 29 Apr 2026 09:50:00 +0530 https://blog.rajpoot.dev/posts/ai/claude-code-skills-agentic-coding-2026/ Claude Code Skills explained — what they are, when to use them, how to write a SKILL.md, the multi-session and writer/reviewer patterns that reshape coding workflows in 2026. Model Context Protocol (MCP) Explained — The USB-C of AI Tools https://blog.rajpoot.dev/posts/ai/model-context-protocol-mcp-explained/ Tue, 28 Apr 2026 21:00:00 +0530 https://blog.rajpoot.dev/posts/ai/model-context-protocol-mcp-explained/ Model Context Protocol (MCP) explained from first principles — what it is, how it works, why it matters, and how to build an MCP server for your own tools and data. Self-Hosted LLMs in 2026 — Ollama, vLLM, and When to Skip the API https://blog.rajpoot.dev/posts/ai/self-hosted-llms-vllm-ollama-2026/ Tue, 28 Apr 2026 20:50:00 +0530 https://blog.rajpoot.dev/posts/ai/self-hosted-llms-vllm-ollama-2026/ When to self-host LLMs in 2026 — Ollama for dev, vLLM and SGLang for production, model choice, hardware sizing, and the latency/cost tradeoffs vs hosted APIs. LLM Evaluations — How to Test Prompts and Agents Like a Pro https://blog.rajpoot.dev/posts/ai/llm-evaluations-test-prompts-agents/ Tue, 28 Apr 2026 16:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-evaluations-test-prompts-agents/ A practical, no-fluff guide to evaluating LLM applications — what to measure, how to build a starter eval set, LLM-as-judge done right, and how to wire evals into CI. Prompt Engineering Patterns That Survive Production https://blog.rajpoot.dev/posts/ai/prompt-engineering-production-patterns/ Tue, 28 Apr 2026 16:30:00 +0530 https://blog.rajpoot.dev/posts/ai/prompt-engineering-production-patterns/ Prompt engineering patterns that hold up in production — system prompts, structured outputs, few-shot, reasoning steps, role separation, and the anti-patterns that look clever but quietly fail. Anthropic Claude API + Tool Use — A Practical Guide for 2026 https://blog.rajpoot.dev/posts/ai/anthropic-claude-api-tool-use-guide/ Tue, 28 Apr 2026 16:20:00 +0530 https://blog.rajpoot.dev/posts/ai/anthropic-claude-api-tool-use-guide/ A no-fluff guide to the Anthropic Claude API in 2026 — messages, tool use, prompt caching, structured outputs, streaming, and the patterns that ship. AI Agents with LangGraph in 2026 — A Practical Tutorial https://blog.rajpoot.dev/posts/ai/ai-agents-with-langgraph-tutorial/ Tue, 28 Apr 2026 16:10:00 +0530 https://blog.rajpoot.dev/posts/ai/ai-agents-with-langgraph-tutorial/ Build a real AI agent with LangGraph — tools, state, memory, conditional routing, and the production patterns that separate working agents from demoware. Build a Production RAG App with pgvector and FastAPI in 2026 https://blog.rajpoot.dev/posts/ai/build-rag-app-pgvector-fastapi/ Tue, 28 Apr 2026 16:00:00 +0530 https://blog.rajpoot.dev/posts/ai/build-rag-app-pgvector-fastapi/ A complete, copy-paste guide to building a Retrieval-Augmented Generation (RAG) backend with PostgreSQL + pgvector and FastAPI — chunking, embeddings, hybrid search, and the parts most tutorials skip.