Posts on AI engineering — the discipline of building real products on top of LLMs. Practical writing on RAG, agents, prompt engineering, vector databases, evaluations, and the production realities of shipping AI features that don’t fall apart in week three.
LLM Cost Optimization in 2026 — Tactics That Cut Bills 50–90%
Production-tested LLM cost optimization tactics. Prompt caching, model routing, semantic caching, batching, fine-tuning small models, output bounds, and the architecture decisions that make the cost line item bearable.