AI/LLM Cheatsheet 11 — Cost Optimization
Cheatsheet: prompt caching, batching, model selection, output limits.
Cheatsheet: prompt caching, batching, model selection, output limits.
Cheatsheet: Ollama, vLLM, llama.cpp, when to self-host.
Cheatsheet: when to fine-tune, LoRA, QLoRA, OpenAI fine-tune.
Cheatsheet: vision LLMs, image inputs, audio, video.
Cheatsheet: Vector column, cosine_distance / l2 / inner_product, HNSW index, hybrid filter + ANN.
Cheatsheet: prompt injection, defenses, PII, jailbreaks.
Cheatsheet: vector DBs, HNSW, hybrid search, sharding.
Cheatsheet: logging, traces, metrics, evals in prod.
Cheatsheet: classification, extraction, summarization, routing, decomposition.
Cheatsheet: streaming Claude / GPT / vLLM tokens via SSE, tool-call loops, cancellation, prompt caching.