AI Engineering

Posts on AI engineering — the discipline of building real products on top of LLMs. Practical writing on RAG, agents, prompt engineering, vector databases, evaluations, and the production realities of shipping AI features that don’t fall apart in week three.

LLM Cost Optimization in 2026 — Tactics That Cut Bills 50–90%

Production-tested LLM cost optimization tactics. Prompt caching, model routing, semantic caching, batching, fine-tuning small models, output bounds, and the architecture decisions that make the cost line item bearable.

Building an MCP Server for Your SaaS — A 2026 Distribution Strategy

Why an MCP server is now table stakes for SaaS distribution, what tools and resources to expose, OAuth patterns for hosted MCP, observability, and a complete working server in TypeScript.

Structured Output for LLMs in 2026 — Pydantic AI, Instructor, and the End of JSON Parsing

Reliable structured output from LLMs in 2026. Pydantic AI, Instructor, OpenAI’s structured outputs / Anthropic tool calling, retry-on-validation patterns, and the end of homemade JSON parsing.

AI Gateways in 2026 — LiteLLM, Portkey, Helicone, and the OpenAI Façade

Why AI gateways became standard infrastructure in 2026. The OpenAI-compatible façade pattern, LiteLLM vs Portkey vs Helicone vs OpenRouter, fallbacks, caching, observability, cost control, and how to drop one in front of an existing app.

Multi-Agent Systems in 2026 — Production Patterns That Work

When multi-agent systems beat single-agent, the supervisor / worker / reviewer / critic patterns, hierarchical vs swarm, communication models, evaluation, and the production patterns that survive contact with users.

Cursor vs Windsurf vs Claude Code in 2026 — An Honest Comparison

An honest 2026 comparison of Cursor, Windsurf, and Claude Code. Strengths, pricing, context windows, agent modes, multi-file edits, and a clear by-workflow recommendation.

Fine-Tuning vs RAG vs Prompting in 2026 — How to Pick the Right Approach

A practical 2026 decision guide for LLM teams. When fine-tuning earns its cost, when RAG is right, when prompting is enough, the hybrid patterns, and the ops realities that change which one fits.

Claude Code Skills and Agentic Coding Patterns in 2026

How Claude Code Skills work, the SKILL.md format, when skills beat tools, multi-session workflows (writer/reviewer, fresh-context review), and the agentic coding patterns that ship real production code in 2026.

Model Context Protocol (MCP) Explained — The USB-C of AI Tools

MCP from first principles — the protocol, transports (stdio, HTTP), tools/resources/prompts, why it works, and a concrete walkthrough of building an MCP server in Python and TypeScript.

Self-Hosted LLMs in 2026 — Ollama, vLLM, and When to Skip the API

A practical guide to self-hosting LLMs in 2026. Ollama for development, vLLM and SGLang for production, model selection (Llama 3.3, Qwen 2.5, DeepSeek V3), hardware sizing, batching, and when self-hosting is genuinely cheaper than the API.