LLM agents cheatsheet.

What is an agent

An LLM in a loop with tools. It plans, acts, observes, repeats until done.

while not done:
    decision = llm(prompt, tools)
    if tool_call:
        result = run_tool(decision)
        prompt += result
    else:
        return decision

Minimal loop

messages = [{"role": "user", "content": user_input}]
max_iter = 10

for i in range(max_iter):
    response = llm(messages, tools=tools)
    messages.append({"role": "assistant", "content": response.content})
    
    if response.stop_reason != "tool_use":
        return response.text
    
    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            result = run_tool(block.name, block.input)
            tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})
    
    messages.append({"role": "user", "content": tool_results})

raise RuntimeError("Max iterations")

Tools as plain functions

def search_web(query: str) -> str:
    """Search the web for information."""
    ...

def read_file(path: str) -> str:
    """Read file contents."""
    ...

tools = [
    {"name": "search_web", "description": search_web.__doc__, "input_schema": {...}},
    {"name": "read_file", "description": read_file.__doc__, "input_schema": {...}},
]

ReAct (thought-action-observation)

Older pattern. Modern reasoning models obviate it:

Thought: I need to search for X.
Action: search_web("X")
Observation: result is Y.
Thought: Now I know Y.
Action: ...

With tool-use API: handled implicitly.

Multi-step planning

system = """
You are an agent that completes tasks step by step.
Plan first, then execute.

Output:
1. Brief plan
2. First step (tool call)
"""

For complex tasks: have agent output a plan, then execute steps.

Memory

Short-term (context window)

Just include in messages. Truncate / summarize when long.

Long-term

Vector DB:

def remember(text):
    vector_db.insert(embedding=embed(text), payload={"text": text, "ts": now()})

def recall(query, k=5):
    return vector_db.search(embed(query), k=k)

Tool the LLM can call.

Working memory (scratchpad)

A file the agent reads/writes:

tools = [
    {"name": "read_scratchpad", ...},
    {"name": "write_scratchpad", ...},
]

Reflection

After completing, ask LLM to critique its own work:

critique = llm(f"Original task: {task}\nResponse: {response}\n\nCritique the response and suggest improvements.")
revised = llm(f"Original: {response}\nCritique: {critique}\n\nRevised response:")

Sub-agents (delegation)

Main agent has a delegate_task(description, agent_type) tool. Sub-agents have narrower scope, separate context.

Good for: long-horizon tasks where main context would fill up.

Multi-agent (CrewAI / AutoGen / LangGraph)

# CrewAI example
from crewai import Agent, Task, Crew

researcher = Agent(role="Researcher", goal="Find info", backstory="...", tools=[search])
writer = Agent(role="Writer", goal="Write article", backstory="...", tools=[])

t1 = Task(description="Research X", agent=researcher)
t2 = Task(description="Write article from research", agent=writer, context=[t1])

crew = Crew(agents=[researcher, writer], tasks=[t1, t2])
result = crew.kickoff()

Often more complexity than needed. Start with one agent + tools.

LangGraph (stateful workflows)

from langgraph.graph import StateGraph

graph = StateGraph(State)
graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("review", review_node)

graph.add_edge("plan", "execute")
graph.add_conditional_edges("execute", router_fn)
graph.set_entry_point("plan")

app = graph.compile()

For complex workflows with explicit state.

Cost control

total_tokens = 0
max_tokens_per_run = 100_000

for i in range(max_iter):
    response = llm(...)
    total_tokens += response.usage.total
    if total_tokens > max_tokens_per_run:
        raise RuntimeError("Budget exceeded")

Time limits

deadline = time.time() + 300
while time.time() < deadline:
    ...

Determinism / testing

Hard to test. Strategies:

  • Mock LLM responses.
  • Run with temperature=0 and seed (best-effort).
  • Snapshot tests on tool call sequences.
  • Eval suite: input → expected outcome.

Safety

Agents that run code / browse web / send messages need:

  • Sandboxing (Docker, gVisor).
  • Permission checks per tool.
  • Human-in-the-loop for risky actions.
  • Audit log of all actions.

Frameworks (use sparingly)

  • LangChain / LangGraph: most popular, big.
  • LlamaIndex: RAG-focused.
  • CrewAI: multi-agent.
  • AutoGen: multi-agent.
  • Pydantic AI: type-safe.
  • Anthropic SDK + custom loop: simplest, most control.

For most: build your own thin loop.

Streaming

Stream agent’s tool calls + final answer to UI:

# Tool call announcement: "I'm searching for X..."
# Tool result: streamed back
# Final answer: streamed text

Common mistakes

  • No iteration cap → infinite loop on bad task.
  • No cost cap → runaway bill.
  • Mixing chat history with tool messages incorrectly.
  • Tools that block long → request timeout.
  • No human approval for destructive actions.

Read this next

If you want my agent template (loop + tools + memory), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .