Posts on AI engineering — the discipline of building real products on top of LLMs. Practical writing on RAG, agents, prompt engineering, vector databases, evaluations, and the production realities of shipping AI features that don’t fall apart in week three.
Modern prompt engineering: instruction clarity, structured prompts, few-shot vs zero-shot, role tags, and the patterns that survive model upgrades.
Honest agent framework comparison: LangGraph for stateful workflows, CrewAI for multi-agent, OpenAI Agents SDK, and where 200 lines of Python beats them all.
Practical agent memory: working memory in the prompt, episodic memory in append-only stores, semantic memory in vector DBs, and how to compose them.
Practical long-context: when more context helps vs hurts, the lost-in-the-middle problem, caching strategies, retrieval as the better default, and 1M-context economics.
Practical multimodal: vision-aware document understanding, audio transcription + reasoning, image-from-text, video understanding, and where multimodal pays off.
How to actually evaluate RAG: retrieval recall and MRR, answer faithfulness and relevance, golden datasets, automated eval pipelines, and Ragas.
Practical LLM observability: tracing every call, eval harnesses, regression detection, prompt versioning, and how to debug the model in production.
Practical LLM cost cuts: prompt caching, model routing, batch APIs, structured output, fine-tunes for high-volume narrow tasks, and cache hierarchies.
Production guardrail patterns: input filters, output validators, prompt injection defenses, PII redaction, and how to compose guardrails without killing latency.
Picking a vector store: pgvector for most apps, Qdrant for self-host at scale, Pinecone for managed simplicity, Milvus for billion-row workloads, Vectorize for edge.