AI | Manvendra Rajpoot

AI/LLM Cheatsheet 19 — Building Chat UI

Cheatsheet: chat UI, streaming, markdown rendering, code blocks.

AI/LLM Cheatsheet 20 — Production LLM App

Cheatsheet: full prod LLM app stack.

Self-Hosting LLMs in 2026 — When the Math Actually Works

Practical LLM self-hosting math: GPU pricing, throughput per GPU, sustained load break-even, vLLM tuning, and when API still wins.

Anthropic API Best Practices in 2026 — Caching, Tool Use, Streaming, and Production Patterns

Practical Anthropic API: prompt caching tactics, tool use loops, streaming, batch API, retries, and pitfalls from real production deployments.

Evaluating AI Coding Tools in 2026 — Benchmarks That Matter and Ones That Don't

Practical AI coding eval: SWE-bench / live benchmarks, internal benchmarks on your codebase, productivity metrics, and what to ignore.

Synthetic Data with LLMs in 2026 — Use Cases, Risks, and the Patterns That Work

Practical synthetic data: fine-tune training data, eval set generation, edge case enumeration, and the model-collapse / quality risks to watch.

Voice Agents in 2026 — STT, LLM, TTS, and Latency That Doesn't Hurt

Practical voice agent architecture: streaming Deepgram/AssemblyAI → LLM → ElevenLabs/OpenAI TTS, latency budgeting, barge-in, and patterns from production calls.

Model Context Protocol (MCP) in 2026 — What It Solved, What It Didn't

Practical MCP: building an MCP server, integrating with Claude / Cursor, when MCP wins, and the security pitfalls of remote tool access.

LLM Tool Use Patterns in 2026 — Schemas, Validation, and the Loop

Practical LLM tool use: schema design, parallel tool calls, error/retry on bad inputs, tool result formatting, and patterns that scale beyond 5 tools.

Agentic Coding in 2026 — Claude Code, Cursor, and the Real Workflow

Honest take on AI coding agents: where Claude Code / Cursor shine, when they hurt, the discipline of using them well, and what stays human.