AI Engineering

Posts on AI engineering — the discipline of building real products on top of LLMs. Practical writing on RAG, agents, prompt engineering, vector databases, evaluations, and the production realities of shipping AI features that don’t fall apart in week three.

Prompt Engineering in 2026 — What Still Works, What Doesn't, and What Changed

Modern prompt engineering: instruction clarity, structured prompts, few-shot vs zero-shot, role tags, and the patterns that survive model upgrades.

LLM Agent Frameworks in 2026 — LangGraph, CrewAI, and the Bare-Metal Alternative

Honest agent framework comparison: LangGraph for stateful workflows, CrewAI for multi-agent, OpenAI Agents SDK, and where 200 lines of Python beats them all.

Agent Memory Systems in 2026 — Episodic, Semantic, and the Patterns That Stick

Practical agent memory: working memory in the prompt, episodic memory in append-only stores, semantic memory in vector DBs, and how to compose them.

LLM Context Windows in 2026 — Long Context, Cache, and the Limits of 'Just Add More'

Practical long-context: when more context helps vs hurts, the lost-in-the-middle problem, caching strategies, retrieval as the better default, and 1M-context economics.

Multimodal LLMs in 2026 — Vision, Audio, and What's Actually Useful

Practical multimodal: vision-aware document understanding, audio transcription + reasoning, image-from-text, video understanding, and where multimodal pays off.

Evaluating RAG Systems in 2026 — Retrieval Quality, Faithfulness, and the Metrics That Matter

How to actually evaluate RAG: retrieval recall and MRR, answer faithfulness and relevance, golden datasets, automated eval pipelines, and Ragas.

LLM Observability in 2026 — Tracing, Evals, and the Things You Can't Skip

Practical LLM observability: tracing every call, eval harnesses, regression detection, prompt versioning, and how to debug the model in production.

LLM Cost Optimization in 2026 — From Bills That Hurt to Bills That Don't

Practical LLM cost cuts: prompt caching, model routing, batch APIs, structured output, fine-tunes for high-volume narrow tasks, and cache hierarchies.

LLM Guardrails in 2026 — Input Filtering, Output Validation, and Safety Nets

Production guardrail patterns: input filters, output validators, prompt injection defenses, PII redaction, and how to compose guardrails without killing latency.

Embedding Databases in 2026 — pgvector, Qdrant, Weaviate, Milvus, Pinecone

Picking a vector store: pgvector for most apps, Qdrant for self-host at scale, Pinecone for managed simplicity, Milvus for billion-row workloads, Vectorize for edge.