Ai | Manvendra Rajpoot

Fine-Tuning LLMs in 2026 — LoRA, QLoRA, and the Cheap Path to Specialized Models

How to actually fine-tune LLMs in 2026 — LoRA / QLoRA mechanics, training data discipline, evaluation, and the patterns that make fine-tunes ship.

LLM Agent Error Recovery in 2026 — Patterns That Don't Loop Forever

Production agent error handling. Per-tool retries vs whole-agent retries, fallback paths, step caps, escalation, human-in-the-loop, and the patterns from real agent deployments.

OpenAI vs Anthropic vs Google for Production AI in 2026

Picking a frontier LLM provider in 2026. Model quality across reasoning / coding / extraction, pricing, latency, ecosystem maturity, and which fits which workload.

Document AI in 2026 — Extracting Structured Data from PDFs and Images

Document AI in 2026: vision LLMs (Claude, GPT-4o, Gemini), classical OCR (Tesseract / Textract), layout models, and the production patterns for invoices, receipts, contracts.

LLM Prompt Caching Deep Dive — Anthropic, OpenAI, and the Patterns That Save 90%

How prompt caching actually works at Anthropic and OpenAI, where to place breakpoints for max hit rate, measuring cache effectiveness, and the patterns that compound across calls.

LLM Evaluation Frameworks in 2026 — Braintrust, LangSmith, Ragas, DeepEval

Comparison of LLM eval frameworks: Braintrust (ship-eval-with-code), LangSmith (LangChain-native), Ragas (RAG-specific), DeepEval (Pytest-style). Which to pick by team.

Designing Tools for AI Agents in 2026 — The Patterns That Work

Tool design for agents — names, descriptions as prompts, input schemas, error handling, idempotency, and the patterns that make agents call them correctly.

Context Engineering for LLMs in 2026 — The Discipline Beyond Prompting

Context engineering — what goes in the context window, in what order, and why. The patterns that separate working agents from confused ones.

LLM Streaming with Cancellation — Patterns That Don't Waste Tokens

Production LLM streaming with cancellation. SSE plus abort, client cancel propagating to provider, partial-response handling, and the patterns that save real tokens.

LLM Routing in 2026 — Use Haiku to Save 80% on Sonnet/Opus Bills

Most LLM apps run every query on the most expensive model. Routing with a small classifier sends easy queries to Haiku and reserves Opus for hard ones. The pattern, the math, and the implementation.