LLM patterns cheatsheet.
Classification
class Label(BaseModel):
category: Literal["bug", "feature", "question", "other"]
confidence: float
prompt = f"""
Classify the ticket:
{ticket}
Categories: bug, feature, question, other.
"""
Use small model + structured output.
Extraction
class Person(BaseModel):
name: str
age: int | None
email: str | None
prompt = f"Extract person info from: {text}"
Validate with Pydantic. Re-prompt on failure.
Summarization
prompt = f"""
Summarize in 3 bullets:
{article}
Each bullet: one sentence, under 20 words.
"""
For long docs: map-reduce.
chunks = chunk(article, size=2000)
summaries = [llm(f"Summarize: {c}") for c in chunks]
final = llm(f"Combine: {summaries}")
Translation
prompt = f"Translate from {src_lang} to {tgt_lang}:\n{text}"
For quality: include cultural context, style notes.
Q&A over docs
See RAG cheatsheet.
Routing
class Route(BaseModel):
intent: Literal["billing", "tech", "sales", "other"]
route = llm(f"Route this: {user_msg}", schema=Route)
if route.intent == "billing":
answer = billing_agent(...)
elif route.intent == "tech":
answer = tech_agent(...)
Smaller model for routing; specialist for response.
Decomposition
Break complex task into steps:
plan = llm(f"Plan steps to: {task}\nOutput JSON list.")
results = []
for step in plan:
result = llm(f"Do step: {step}\nPrev results: {results}")
results.append(result)
Chain-of-models
draft = cheap_llm("Generate draft")
critique = mid_llm(f"Critique: {draft}")
final = premium_llm(f"Final based on critique: {critique}")
Self-consistency
answers = [llm(prompt, temp=0.7) for _ in range(5)]
# Majority vote or take most common
Better accuracy on hard tasks; 5x cost.
Voting / ensemble
results = [llm(prompt, model=m) for m in [gpt5, claude, gemini]]
final = judge(results)
Critic / fix loop
output = llm(prompt)
for i in range(3):
critique = llm(f"Critique: {output}")
if "looks good" in critique.lower(): break
output = llm(f"Improve based on: {critique}\nOriginal: {output}")
Reflexion
Try → fail → reflect → try again. Like fix-loop but with explicit failure detection.
Few-shot bank
Maintain library of (input, expected_output) examples. Retrieve top-k similar via embeddings; include in prompt dynamically.
Skeleton-of-thought
Generate outline first, then expand each section. Faster generation (parallelize sections).
Caching at semantic level
def cached_llm(query):
similar = vector_db.search(embed(query), threshold=0.95)
if similar: return similar[0].answer
answer = llm(query)
cache.add(query, answer)
return answer
Saves money. Use for FAQs / common queries.
Constrained generation
Force valid JSON/SQL:
- OpenAI strict mode / Anthropic tools.
- guidance, lmql, outlines libraries.
- Logit bias for token-level control.
Long context strategies
For 1M+ token contexts (Claude Opus 4.7):
- Place key info at top + bottom.
- Use clear section headers.
- Test middle-of-context recall.
For documents larger than context:
- RAG.
- Map-reduce.
- Iterative refinement.
Mixture of agents
Multiple LLMs draft answers; meta-LLM combines.
drafts = [llm_a(p), llm_b(p), llm_c(p)]
final = synthesizer_llm(f"Combine best parts of: {drafts}")
Streaming UX
Stream tokens as they come. Show “thinking” indicator for reasoning models.
Confidence calibration
LLMs don’t naturally know what they don’t know. Strategies:
- Ask: “Rate confidence 1-10.”
- Cross-check with retrieval.
- Use logprobs (OpenAI).
Common patterns to avoid
- Single huge prompt doing 5 things.
- “Be helpful” with no guidance.
- Trust LLM math without tool.
- LLM-only for high-stakes decisions.
- Building LangChain for “Hello world” usage.
Read this next
If you want my pattern recipes, they’re at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .