LLM | Manvendra Rajpoot

OpenAI vs Anthropic vs Google for Production AI in 2026

Picking a frontier LLM provider in 2026. Model quality across reasoning / coding / extraction, pricing, latency, ecosystem maturity, and which fits which workload.

LLM Prompt Caching Deep Dive — Anthropic, OpenAI, and the Patterns That Save 90%

How prompt caching actually works at Anthropic and OpenAI, where to place breakpoints for max hit rate, measuring cache effectiveness, and the patterns that compound across calls.

LLM Evaluation Frameworks in 2026 — Braintrust, LangSmith, Ragas, DeepEval

Comparison of LLM eval frameworks: Braintrust (ship-eval-with-code), LangSmith (LangChain-native), Ragas (RAG-specific), DeepEval (Pytest-style). Which to pick by team.

Context Engineering for LLMs in 2026 — The Discipline Beyond Prompting

Context engineering — what goes in the context window, in what order, and why. The patterns that separate working agents from confused ones.

LLM Streaming with Cancellation — Patterns That Don't Waste Tokens

Production LLM streaming with cancellation. SSE plus abort, client cancel propagating to provider, partial-response handling, and the patterns that save real tokens.

LLM Routing in 2026 — Use Haiku to Save 80% on Sonnet/Opus Bills

Most LLM apps run every query on the most expensive model. Routing with a small classifier sends easy queries to Haiku and reserves Opus for hard ones. The pattern, the math, and the implementation.

1M-Token Context Windows in 2026 — When They Help, When They Hurt

How to actually use 1M-token context windows. The ‘just put it all in context’ temptation, when it works, when RAG still wins, prompt caching, and cost.

Agentic RAG in 2026 — When Retrieval Becomes a Tool, Not a Pipeline

Why agentic RAG often beats one-shot RAG. Tool-based retrieval, decomposition, query rewriting, self-reflection, citations, and the production patterns that ship in 2026.

LLM Security in 2026 — Prompt Injection, Data Exfiltration, and Defense in Depth

LLM security threats and defenses in 2026. Direct + indirect prompt injection, exfiltration via tool calls or markdown, jailbreaks, and the layered defenses (input tagging, output filtering, allow-lists, OPA, sandboxing).

Voice Agents and Realtime LLM APIs in 2026 — How They Actually Work

A practical look at building voice agents in 2026. Realtime LLM APIs (OpenAI Realtime, Anthropic, Gemini Live), end-to-end latency, ASR and TTS, interruption handling, and the production patterns from real deployments.