Cost | Manvendra Rajpoot

LLM Cost Optimization in 2026 — From Bills That Hurt to Bills That Don't

Practical LLM cost cuts: prompt caching, model routing, batch APIs, structured output, fine-tunes for high-volume narrow tasks, and cache hierarchies.

LLM Prompt Caching Deep Dive — Anthropic, OpenAI, and the Patterns That Save 90%

How prompt caching actually works at Anthropic and OpenAI, where to place breakpoints for max hit rate, measuring cache effectiveness, and the patterns that compound across calls.

Kubernetes Cost Engineering in 2026 — Where the Money Actually Goes

Practical Kubernetes cost reduction in 2026. Right-size requests / limits, Karpenter for node management, VPA / HPA, spot pools, and the dashboards that find waste.

Cloud Cost Optimization in 2026 — The Tactics That Actually Work

Tactics that produce real cloud savings: right-sizing, spot, savings plans, egress, idle cleanup, multi-cloud arbitrage, FinOps culture, and the diagnostic tools that find waste.

LLM Routing in 2026 — Use Haiku to Save 80% on Sonnet/Opus Bills

Most LLM apps run every query on the most expensive model. Routing with a small classifier sends easy queries to Haiku and reserves Opus for hard ones. The pattern, the math, and the implementation.