LLM patterns cheatsheet.

Classification

class Label(BaseModel):
    category: Literal["bug", "feature", "question", "other"]
    confidence: float

prompt = f"""
Classify the ticket:
{ticket}

Categories: bug, feature, question, other.
"""

Use small model + structured output.

Extraction

class Person(BaseModel):
    name: str
    age: int | None
    email: str | None

prompt = f"Extract person info from: {text}"

Validate with Pydantic. Re-prompt on failure.

Summarization

prompt = f"""
Summarize in 3 bullets:
{article}

Each bullet: one sentence, under 20 words.
"""

For long docs: map-reduce.

chunks = chunk(article, size=2000)
summaries = [llm(f"Summarize: {c}") for c in chunks]
final = llm(f"Combine: {summaries}")

Translation

prompt = f"Translate from {src_lang} to {tgt_lang}:\n{text}"

For quality: include cultural context, style notes.

Q&A over docs

See RAG cheatsheet.

Routing

class Route(BaseModel):
    intent: Literal["billing", "tech", "sales", "other"]

route = llm(f"Route this: {user_msg}", schema=Route)

if route.intent == "billing":
    answer = billing_agent(...)
elif route.intent == "tech":
    answer = tech_agent(...)

Smaller model for routing; specialist for response.

Decomposition

Break complex task into steps:

plan = llm(f"Plan steps to: {task}\nOutput JSON list.")
results = []
for step in plan:
    result = llm(f"Do step: {step}\nPrev results: {results}")
    results.append(result)

Chain-of-models

draft = cheap_llm("Generate draft")
critique = mid_llm(f"Critique: {draft}")
final = premium_llm(f"Final based on critique: {critique}")

Self-consistency

answers = [llm(prompt, temp=0.7) for _ in range(5)]
# Majority vote or take most common

Better accuracy on hard tasks; 5x cost.

Voting / ensemble

results = [llm(prompt, model=m) for m in [gpt5, claude, gemini]]
final = judge(results)

Critic / fix loop

output = llm(prompt)
for i in range(3):
    critique = llm(f"Critique: {output}")
    if "looks good" in critique.lower(): break
    output = llm(f"Improve based on: {critique}\nOriginal: {output}")

Reflexion

Try → fail → reflect → try again. Like fix-loop but with explicit failure detection.

Few-shot bank

Maintain library of (input, expected_output) examples. Retrieve top-k similar via embeddings; include in prompt dynamically.

Skeleton-of-thought

Generate outline first, then expand each section. Faster generation (parallelize sections).

Caching at semantic level

def cached_llm(query):
    similar = vector_db.search(embed(query), threshold=0.95)
    if similar: return similar[0].answer
    answer = llm(query)
    cache.add(query, answer)
    return answer

Saves money. Use for FAQs / common queries.

Constrained generation

Force valid JSON/SQL:

  • OpenAI strict mode / Anthropic tools.
  • guidance, lmql, outlines libraries.
  • Logit bias for token-level control.

Long context strategies

For 1M+ token contexts (Claude Opus 4.7):

  • Place key info at top + bottom.
  • Use clear section headers.
  • Test middle-of-context recall.

For documents larger than context:

  • RAG.
  • Map-reduce.
  • Iterative refinement.

Mixture of agents

Multiple LLMs draft answers; meta-LLM combines.

drafts = [llm_a(p), llm_b(p), llm_c(p)]
final = synthesizer_llm(f"Combine best parts of: {drafts}")

Streaming UX

Stream tokens as they come. Show “thinking” indicator for reasoning models.

Confidence calibration

LLMs don’t naturally know what they don’t know. Strategies:

  • Ask: “Rate confidence 1-10.”
  • Cross-check with retrieval.
  • Use logprobs (OpenAI).

Common patterns to avoid

  • Single huge prompt doing 5 things.
  • “Be helpful” with no guidance.
  • Trust LLM math without tool.
  • LLM-only for high-stakes decisions.
  • Building LangChain for “Hello world” usage.

Read this next

If you want my pattern recipes, they’re at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .