AI/LLM Cheatsheet 19 — Building Chat UI
Cheatsheet: chat UI, streaming, markdown rendering, code blocks.
Cheatsheet: chat UI, streaming, markdown rendering, code blocks.
Cheatsheet: full prod LLM app stack.
Practical LLM self-hosting math: GPU pricing, throughput per GPU, sustained load break-even, vLLM tuning, and when API still wins.
Practical Anthropic API: prompt caching tactics, tool use loops, streaming, batch API, retries, and pitfalls from real production deployments.
Practical AI coding eval: SWE-bench / live benchmarks, internal benchmarks on your codebase, productivity metrics, and what to ignore.
Practical synthetic data: fine-tune training data, eval set generation, edge case enumeration, and the model-collapse / quality risks to watch.
Practical voice agent architecture: streaming Deepgram/AssemblyAI → LLM → ElevenLabs/OpenAI TTS, latency budgeting, barge-in, and patterns from production calls.
Practical MCP: building an MCP server, integrating with Claude / Cursor, when MCP wins, and the security pitfalls of remote tool access.
Practical LLM tool use: schema design, parallel tool calls, error/retry on bad inputs, tool result formatting, and patterns that scale beyond 5 tools.
Honest take on AI coding agents: where Claude Code / Cursor shine, when they hurt, the discipline of using them well, and what stays human.