LLM on Manvendra Rajpoot

LLM on Manvendra Rajpoot https://blog.rajpoot.dev/tags/llm/ Recent content in LLM on Manvendra Rajpoot Manvendra Rajpoot https://blog.rajpoot.dev/img/personal/cover.png https://blog.rajpoot.dev/img/personal/cover.png Hugo en Manvendra Rajpoot Sun, 17 May 2026 17:50:46 +0530 AI/LLM Cheatsheet 01 — LLM Basics https://blog.rajpoot.dev/cheatsheets/ai-llm/01-basics-cheatsheet/ Tue, 26 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/01-basics-cheatsheet/ LLM basics — OpenAI, Anthropic, prompts, tokens, costs. AI/LLM Cheatsheet 02 — Prompt Engineering https://blog.rajpoot.dev/cheatsheets/ai-llm/02-prompts-cheatsheet/ Tue, 26 May 2026 06:10:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/02-prompts-cheatsheet/ Prompt engineering — patterns, structure, few-shot, CoT. AI/LLM Cheatsheet 03 — Tool Use / Function Calling https://blog.rajpoot.dev/cheatsheets/ai-llm/03-tools-cheatsheet/ Tue, 26 May 2026 06:20:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/03-tools-cheatsheet/ Tool use — OpenAI, Anthropic, parallel calls. AI/LLM Cheatsheet 04 — Embeddings https://blog.rajpoot.dev/cheatsheets/ai-llm/04-embeddings-cheatsheet/ Tue, 26 May 2026 06:30:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/04-embeddings-cheatsheet/ Embeddings — OpenAI, Cohere, BGE, local. AI/LLM Cheatsheet 05 — RAG Patterns https://blog.rajpoot.dev/cheatsheets/ai-llm/05-rag-cheatsheet/ Tue, 26 May 2026 06:40:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/05-rag-cheatsheet/ RAG — retrieve + generate, hybrid, rerank, citations. AI/LLM Cheatsheet 08 — Streaming Patterns https://blog.rajpoot.dev/cheatsheets/ai-llm/08-streaming-cheatsheet/ Tue, 26 May 2026 07:10:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/08-streaming-cheatsheet/ LLM streaming — SSE, token-by-token, frontend handling. AI/LLM Cheatsheet 09 — Function Calling Patterns https://blog.rajpoot.dev/cheatsheets/ai-llm/09-function-calling-cheatsheet/ Tue, 26 May 2026 07:20:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/09-function-calling-cheatsheet/ LLM function calling patterns — schemas, validation, parallel. AI/LLM Cheatsheet 10 — Evaluation https://blog.rajpoot.dev/cheatsheets/ai-llm/10-eval-cheatsheet/ Tue, 26 May 2026 07:30:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/10-eval-cheatsheet/ LLM evaluation — datasets, LLM-as-judge, regression tests. AI/LLM Cheatsheet 11 — Cost Optimization https://blog.rajpoot.dev/cheatsheets/ai-llm/11-costs-cheatsheet/ Tue, 26 May 2026 07:40:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/11-costs-cheatsheet/ LLM cost optimization — caching, batching, smaller models. AI/LLM Cheatsheet 12 — Local LLMs (Ollama, vLLM) https://blog.rajpoot.dev/cheatsheets/ai-llm/12-local-llms-cheatsheet/ Tue, 26 May 2026 07:50:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/12-local-llms-cheatsheet/ Local LLMs — Ollama, vLLM, llama.cpp. AI/LLM Cheatsheet 13 — Fine-tuning https://blog.rajpoot.dev/cheatsheets/ai-llm/13-finetuning-cheatsheet/ Tue, 26 May 2026 08:00:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/13-finetuning-cheatsheet/ Fine-tuning — LoRA, QLoRA, OpenAI fine-tune. AI/LLM Cheatsheet 14 — Multimodal LLMs https://blog.rajpoot.dev/cheatsheets/ai-llm/14-multimodal-cheatsheet/ Tue, 26 May 2026 08:10:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/14-multimodal-cheatsheet/ Multimodal — image, audio, video LLMs. AI/LLM Cheatsheet 15 — Security and Prompt Injection https://blog.rajpoot.dev/cheatsheets/ai-llm/15-security-cheatsheet/ Tue, 26 May 2026 08:20:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/15-security-cheatsheet/ LLM security — prompt injection, data leaks, jailbreaks. AI/LLM Cheatsheet 17 — Observability for LLMs https://blog.rajpoot.dev/cheatsheets/ai-llm/17-observability-cheatsheet/ Tue, 26 May 2026 08:40:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/17-observability-cheatsheet/ LLM observability — LangSmith, Helicone, traces, metrics. AI/LLM Cheatsheet 18 — LLM Application Patterns https://blog.rajpoot.dev/cheatsheets/ai-llm/18-patterns-cheatsheet/ Tue, 26 May 2026 08:50:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/18-patterns-cheatsheet/ LLM app patterns — classification, extraction, summarization, chat. FastAPI Cheatsheet 18 — Streaming and LLM Integration https://blog.rajpoot.dev/cheatsheets/fastapi/18-streaming-llm-cheatsheet/ Mon, 11 May 2026 08:50:00 +0530 https://blog.rajpoot.dev/cheatsheets/fastapi/18-streaming-llm-cheatsheet/ FastAPI LLM streaming cheatsheet — Anthropic / OpenAI / vLLM streaming, tool calls, cancellation. AI/LLM Cheatsheet 19 — Building Chat UI https://blog.rajpoot.dev/cheatsheets/ai-llm/19-chat-ui-cheatsheet/ Tue, 26 May 2026 09:00:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/19-chat-ui-cheatsheet/ Chat UI — streaming, markdown, tool indicators. AI/LLM Cheatsheet 20 — Production LLM App https://blog.rajpoot.dev/cheatsheets/ai-llm/20-production-cheatsheet/ Tue, 26 May 2026 09:10:00 +0530 https://blog.rajpoot.dev/cheatsheets/ai-llm/20-production-cheatsheet/ Production LLM app — architecture, ops, security. Self-Hosting LLMs in 2026 — When the Math Actually Works https://blog.rajpoot.dev/posts/ai/llm-self-host-economics-2026/ Tue, 05 May 2026 08:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-self-host-economics-2026/ Self-hosting LLMs in 2026 — vLLM, GPU economics, break-even, and when self-host beats API. Synthetic Data with LLMs in 2026 — Use Cases, Risks, and the Patterns That Work https://blog.rajpoot.dev/posts/ai/synthetic-data-2026/ Tue, 05 May 2026 06:10:00 +0530 https://blog.rajpoot.dev/posts/ai/synthetic-data-2026/ Synthetic data generation with LLMs in 2026 — when it helps, model collapse risk, eval set generation, and production patterns. LLM Tool Use Patterns in 2026 — Schemas, Validation, and the Loop https://blog.rajpoot.dev/posts/ai/llm-tool-use-patterns-2026/ Mon, 04 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-tool-use-patterns-2026/ LLM tool use in 2026 — designing tool schemas, parallel calls, error handling, and the patterns from production agents. LLM Batch Processing in 2026 — Anthropic / OpenAI Batch API for 50% Off https://blog.rajpoot.dev/posts/ai/llm-batch-processing-2026/ Sun, 03 May 2026 06:20:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-batch-processing-2026/ LLM batch APIs in 2026 — Anthropic, OpenAI, Bedrock batch processing for 50% discount, when to use them, and the patterns that work. LLM Deployment Patterns in 2026 — Inference Servers, Routing, and Production Architectures https://blog.rajpoot.dev/posts/ai/llm-deployment-patterns-2026/ Sun, 03 May 2026 06:10:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-deployment-patterns-2026/ LLM deployment patterns in 2026 — vLLM, TGI, Ollama, hybrid API+self-hosted, routing layers, and the production architectures that actually work. Prompt Engineering in 2026 — What Still Works, What Doesn't, and What Changed https://blog.rajpoot.dev/posts/ai/llm-prompt-engineering-2026/ Sun, 03 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-prompt-engineering-2026/ Prompt engineering in 2026 — patterns that still work, what's been obsoleted by better models, structured prompts, and production discipline. LLM Context Windows in 2026 — Long Context, Cache, and the Limits of 'Just Add More' https://blog.rajpoot.dev/posts/ai/llm-context-windows-2026/ Sat, 02 May 2026 11:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-context-windows-2026/ LLM context windows in 2026 — what 200k / 1M context can and can't do, prompt caching, retrieval, and patterns from production. Multimodal LLMs in 2026 — Vision, Audio, and What's Actually Useful https://blog.rajpoot.dev/posts/ai/multimodal-llms-2026/ Sat, 02 May 2026 09:30:00 +0530 https://blog.rajpoot.dev/posts/ai/multimodal-llms-2026/ Multimodal LLMs in 2026 — vision input, audio input, generation, real-world use cases, and the patterns that work in production. LLM Observability in 2026 — Tracing, Evals, and the Things You Can't Skip https://blog.rajpoot.dev/posts/ai/llm-observability-2026/ Sat, 02 May 2026 07:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-observability-2026/ Production LLM observability in 2026 — distributed tracing, eval pipelines, Langfuse, Arize, and the patterns that turn black-box LLMs into operable systems. LLM Cost Optimization in 2026 — From Bills That Hurt to Bills That Don't https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-2026/ Sat, 02 May 2026 07:20:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-2026/ Cutting LLM costs in 2026 — prompt caching, routing, batching, fine-tunes, and the patterns that drop bills 5-20× without quality loss. LLM Guardrails in 2026 — Input Filtering, Output Validation, and Safety Nets https://blog.rajpoot.dev/posts/ai/llm-guardrails-content-safety-2026/ Fri, 01 May 2026 07:10:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-guardrails-content-safety-2026/ Practical LLM guardrails in 2026 — input filtering, output validation, NVIDIA NeMo, Guardrails AI, and the patterns that prevent embarrassments. Fine-Tuning LLMs in 2026 — LoRA, QLoRA, and the Cheap Path to Specialized Models https://blog.rajpoot.dev/posts/ai/llm-fine-tuning-lora-qlora-2026/ Fri, 01 May 2026 06:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-fine-tuning-lora-qlora-2026/ Practical LLM fine-tuning in 2026 — LoRA, QLoRA, training data prep, evaluation, and the patterns from teams shipping fine-tuned models. OpenAI vs Anthropic vs Google for Production AI in 2026 https://blog.rajpoot.dev/posts/ai/openai-vs-anthropic-vs-google-2026/ Fri, 01 May 2026 03:00:00 +0530 https://blog.rajpoot.dev/posts/ai/openai-vs-anthropic-vs-google-2026/ Honest comparison of OpenAI vs Anthropic vs Google for production LLM apps in 2026 — model quality, pricing, latency, ecosystem, and how to pick. LLM Prompt Caching Deep Dive — Anthropic, OpenAI, and the Patterns That Save 90% https://blog.rajpoot.dev/posts/ai/llm-prompt-caching-deep-dive-2026/ Fri, 01 May 2026 00:30:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-prompt-caching-deep-dive-2026/ Prompt caching mechanics in 2026 — Anthropic's ephemeral cache, OpenAI's automatic caching, breakpoint placement, hit-rate measurement, and the patterns that save real money. LLM Evaluation Frameworks in 2026 — Braintrust, LangSmith, Ragas, DeepEval https://blog.rajpoot.dev/posts/ai/llm-evaluation-frameworks-2026/ Thu, 30 Apr 2026 23:59:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-evaluation-frameworks-2026/ Picking an LLM evaluation framework in 2026 — Braintrust vs LangSmith vs Ragas vs DeepEval. What each does, when each fits. Context Engineering for LLMs in 2026 — The Discipline Beyond Prompting https://blog.rajpoot.dev/posts/ai/llm-context-engineering-patterns-2026/ Thu, 30 Apr 2026 21:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-context-engineering-patterns-2026/ Context engineering for LLMs in 2026 — what to put in context, what to leave out, ordering, compression, and the patterns that make agents work. LLM Streaming with Cancellation — Patterns That Don't Waste Tokens https://blog.rajpoot.dev/posts/ai/llm-streaming-cancellation-patterns-2026/ Thu, 30 Apr 2026 20:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-streaming-cancellation-patterns-2026/ How to implement LLM streaming with proper cancellation in 2026 — SSE patterns, abort signals, server-side cancel, and not paying for tokens the user doesn't want. LLM Routing in 2026 — Use Haiku to Save 80% on Sonnet/Opus Bills https://blog.rajpoot.dev/posts/ai/llm-routing-classification-haiku-2026/ Thu, 30 Apr 2026 19:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-routing-classification-haiku-2026/ How LLM routing with a small classifier (Haiku) saves 80% on Sonnet / Opus / GPT-5 bills in 2026 — patterns, accuracy, and how to wire it in. 1M-Token Context Windows in 2026 — When They Help, When They Hurt https://blog.rajpoot.dev/posts/ai/long-context-1m-tokens-2026/ Thu, 30 Apr 2026 13:10:00 +0530 https://blog.rajpoot.dev/posts/ai/long-context-1m-tokens-2026/ Practical guide to 1M-token context windows in 2026 — when long context replaces RAG, when it doesn't, prompt caching, and the cost reality. Agentic RAG in 2026 — When Retrieval Becomes a Tool, Not a Pipeline https://blog.rajpoot.dev/posts/ai/agentic-rag-2026/ Thu, 30 Apr 2026 13:00:00 +0530 https://blog.rajpoot.dev/posts/ai/agentic-rag-2026/ Agentic RAG explained — when the agent decides what and when to retrieve, multi-step reasoning, query rewriting, self-reflection, and the patterns that beat naive RAG. LLM Security in 2026 — Prompt Injection, Data Exfiltration, and Defense in Depth https://blog.rajpoot.dev/posts/ai/llm-security-prompt-injection-2026/ Thu, 30 Apr 2026 12:50:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-security-prompt-injection-2026/ How to defend against LLM-specific attacks in 2026 — prompt injection, indirect injection, data exfiltration, jailbreaks, and the layered defenses that work. Voice Agents and Realtime LLM APIs in 2026 — How They Actually Work https://blog.rajpoot.dev/posts/ai/voice-agents-realtime-llm-2026/ Thu, 30 Apr 2026 12:10:00 +0530 https://blog.rajpoot.dev/posts/ai/voice-agents-realtime-llm-2026/ How voice agents work in 2026 — Realtime APIs from OpenAI / Anthropic / Google, latency budgets, ASR, TTS, interruption handling, and production architecture. LLM Cost Optimization in 2026 — Tactics That Cut Bills 50–90% https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-tactics-2026/ Thu, 30 Apr 2026 12:00:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-cost-optimization-tactics-2026/ Concrete LLM cost optimization tactics that cut your Anthropic / OpenAI / Gemini bill by 50–90% — caching, model routing, batching, fine-tuning, and the patterns that compound. Structured Output for LLMs in 2026 — Pydantic AI, Instructor, and the End of JSON Parsing https://blog.rajpoot.dev/posts/ai/structured-output-pydantic-ai-instructor-2026/ Thu, 30 Apr 2026 09:10:00 +0530 https://blog.rajpoot.dev/posts/ai/structured-output-pydantic-ai-instructor-2026/ How to get structured, validated output from LLMs in 2026 — Pydantic AI, Instructor, native tool-calling, OpenAI's structured outputs API, and the patterns that make extraction reliable. AI Gateways in 2026 — LiteLLM, Portkey, Helicone, and the OpenAI Façade https://blog.rajpoot.dev/posts/ai/ai-gateways-litellm-portkey-helicone-2026/ Thu, 30 Apr 2026 09:00:00 +0530 https://blog.rajpoot.dev/posts/ai/ai-gateways-litellm-portkey-helicone-2026/ AI gateways explained — why every serious LLM app needs one in 2026, comparison of LiteLLM, Portkey, Helicone, and OpenRouter, and how to add one without rewriting your code. Multi-Agent Systems in 2026 — Production Patterns That Work https://blog.rajpoot.dev/posts/ai/multi-agent-systems-production-patterns-2026/ Thu, 30 Apr 2026 08:50:00 +0530 https://blog.rajpoot.dev/posts/ai/multi-agent-systems-production-patterns-2026/ Multi-agent systems explained — supervisor / worker, writer / reviewer, hierarchical and swarm patterns, and the production gotchas in 2026. Fine-Tuning vs RAG vs Prompting in 2026 — How to Pick the Right Approach https://blog.rajpoot.dev/posts/ai/fine-tuning-vs-rag-vs-prompting-2026/ Wed, 29 Apr 2026 10:00:00 +0530 https://blog.rajpoot.dev/posts/ai/fine-tuning-vs-rag-vs-prompting-2026/ When to fine-tune, when to RAG, and when to just prompt — a practical 2026 decision guide for LLM applications, with cost, quality, and ops tradeoffs. Model Context Protocol (MCP) Explained — The USB-C of AI Tools https://blog.rajpoot.dev/posts/ai/model-context-protocol-mcp-explained/ Tue, 28 Apr 2026 21:00:00 +0530 https://blog.rajpoot.dev/posts/ai/model-context-protocol-mcp-explained/ Model Context Protocol (MCP) explained from first principles — what it is, how it works, why it matters, and how to build an MCP server for your own tools and data. Self-Hosted LLMs in 2026 — Ollama, vLLM, and When to Skip the API https://blog.rajpoot.dev/posts/ai/self-hosted-llms-vllm-ollama-2026/ Tue, 28 Apr 2026 20:50:00 +0530 https://blog.rajpoot.dev/posts/ai/self-hosted-llms-vllm-ollama-2026/ When to self-host LLMs in 2026 — Ollama for dev, vLLM and SGLang for production, model choice, hardware sizing, and the latency/cost tradeoffs vs hosted APIs. LLM Evaluations — How to Test Prompts and Agents Like a Pro https://blog.rajpoot.dev/posts/ai/llm-evaluations-test-prompts-agents/ Tue, 28 Apr 2026 16:40:00 +0530 https://blog.rajpoot.dev/posts/ai/llm-evaluations-test-prompts-agents/ A practical, no-fluff guide to evaluating LLM applications — what to measure, how to build a starter eval set, LLM-as-judge done right, and how to wire evals into CI. Prompt Engineering Patterns That Survive Production https://blog.rajpoot.dev/posts/ai/prompt-engineering-production-patterns/ Tue, 28 Apr 2026 16:30:00 +0530 https://blog.rajpoot.dev/posts/ai/prompt-engineering-production-patterns/ Prompt engineering patterns that hold up in production — system prompts, structured outputs, few-shot, reasoning steps, role separation, and the anti-patterns that look clever but quietly fail. Anthropic Claude API + Tool Use — A Practical Guide for 2026 https://blog.rajpoot.dev/posts/ai/anthropic-claude-api-tool-use-guide/ Tue, 28 Apr 2026 16:20:00 +0530 https://blog.rajpoot.dev/posts/ai/anthropic-claude-api-tool-use-guide/ A no-fluff guide to the Anthropic Claude API in 2026 — messages, tool use, prompt caching, structured outputs, streaming, and the patterns that ship. AI Agents with LangGraph in 2026 — A Practical Tutorial https://blog.rajpoot.dev/posts/ai/ai-agents-with-langgraph-tutorial/ Tue, 28 Apr 2026 16:10:00 +0530 https://blog.rajpoot.dev/posts/ai/ai-agents-with-langgraph-tutorial/ Build a real AI agent with LangGraph — tools, state, memory, conditional routing, and the production patterns that separate working agents from demoware. Build a Production RAG App with pgvector and FastAPI in 2026 https://blog.rajpoot.dev/posts/ai/build-rag-app-pgvector-fastapi/ Tue, 28 Apr 2026 16:00:00 +0530 https://blog.rajpoot.dev/posts/ai/build-rag-app-pgvector-fastapi/ A complete, copy-paste guide to building a Retrieval-Augmented Generation (RAG) backend with PostgreSQL + pgvector and FastAPI — chunking, embeddings, hybrid search, and the parts most tutorials skip.