AI | Manvendra Rajpoot

AI/LLM Cheatsheet 11 — Cost Optimization

Cheatsheet: prompt caching, batching, model selection, output limits.

Cheatsheet: Ollama, vLLM, llama.cpp, when to self-host.

Cheatsheet: when to fine-tune, LoRA, QLoRA, OpenAI fine-tune.

Cheatsheet: vision LLMs, image inputs, audio, video.

Cheatsheet: Vector column, cosine_distance / l2 / inner_product, HNSW index, hybrid filter + ANN.

Cheatsheet: prompt injection, defenses, PII, jailbreaks.

Cheatsheet: vector DBs, HNSW, hybrid search, sharding.

Cheatsheet: logging, traces, metrics, evals in prod.

Cheatsheet: classification, extraction, summarization, routing, decomposition.

Cheatsheet: streaming Claude / GPT / vLLM tokens via SSE, tool-call loops, cancellation, prompt caching.