Fine-Tuning LLMs in 2026 — LoRA, QLoRA, and the Cheap Path to Specialized Models

How to actually fine-tune LLMs in 2026 — LoRA / QLoRA mechanics, training data discipline, evaluation, and the patterns that make fine-tunes ship.

May 1, 2026 · 4 min · 725 words · Manvendra Rajpoot

LLM Agent Error Recovery in 2026 — Patterns That Don't Loop Forever

Production agent error handling. Per-tool retries vs whole-agent retries, fallback paths, step caps, escalation, human-in-the-loop, and the patterns from real agent deployments.

May 1, 2026 · 4 min · 738 words · Manvendra Rajpoot

OpenAI vs Anthropic vs Google for Production AI in 2026

Picking a frontier LLM provider in 2026. Model quality across reasoning / coding / extraction, pricing, latency, ecosystem maturity, and which fits which workload.

May 1, 2026 · 3 min · 612 words · Manvendra Rajpoot

Document AI in 2026 — Extracting Structured Data from PDFs and Images

Document AI in 2026: vision LLMs (Claude, GPT-4o, Gemini), classical OCR (Tesseract / Textract), layout models, and the production patterns for invoices, receipts, contracts.

May 1, 2026 · 4 min · 667 words · Manvendra Rajpoot

LLM Prompt Caching Deep Dive — Anthropic, OpenAI, and the Patterns That Save 90%

How prompt caching actually works at Anthropic and OpenAI, where to place breakpoints for max hit rate, measuring cache effectiveness, and the patterns that compound across calls.

May 1, 2026 · 4 min · 728 words · Manvendra Rajpoot

LLM Evaluation Frameworks in 2026 — Braintrust, LangSmith, Ragas, DeepEval

Comparison of LLM eval frameworks: Braintrust (ship-eval-with-code), LangSmith (LangChain-native), Ragas (RAG-specific), DeepEval (Pytest-style). Which to pick by team.

April 30, 2026 · 3 min · 491 words · Manvendra Rajpoot

Designing Tools for AI Agents in 2026 — The Patterns That Work

Tool design for agents — names, descriptions as prompts, input schemas, error handling, idempotency, and the patterns that make agents call them correctly.

April 30, 2026 · 4 min · 767 words · Manvendra Rajpoot

Context Engineering for LLMs in 2026 — The Discipline Beyond Prompting

Context engineering — what goes in the context window, in what order, and why. The patterns that separate working agents from confused ones.

April 30, 2026 · 3 min · 560 words · Manvendra Rajpoot

LLM Streaming with Cancellation — Patterns That Don't Waste Tokens

Production LLM streaming with cancellation. SSE plus abort, client cancel propagating to provider, partial-response handling, and the patterns that save real tokens.

April 30, 2026 · 3 min · 560 words · Manvendra Rajpoot

LLM Routing in 2026 — Use Haiku to Save 80% on Sonnet/Opus Bills

Most LLM apps run every query on the most expensive model. Routing with a small classifier sends easy queries to Haiku and reserves Opus for hard ones. The pattern, the math, and the implementation.

April 30, 2026 · 3 min · 551 words · Manvendra Rajpoot