LLM agents cheatsheet.
What is an agent
An LLM in a loop with tools. It plans, acts, observes, repeats until done.
while not done:
decision = llm(prompt, tools)
if tool_call:
result = run_tool(decision)
prompt += result
else:
return decision
Minimal loop
messages = [{"role": "user", "content": user_input}]
max_iter = 10
for i in range(max_iter):
response = llm(messages, tools=tools)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return response.text
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = run_tool(block.name, block.input)
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})
messages.append({"role": "user", "content": tool_results})
raise RuntimeError("Max iterations")
Tools as plain functions
def search_web(query: str) -> str:
"""Search the web for information."""
...
def read_file(path: str) -> str:
"""Read file contents."""
...
tools = [
{"name": "search_web", "description": search_web.__doc__, "input_schema": {...}},
{"name": "read_file", "description": read_file.__doc__, "input_schema": {...}},
]
ReAct (thought-action-observation)
Older pattern. Modern reasoning models obviate it:
Thought: I need to search for X.
Action: search_web("X")
Observation: result is Y.
Thought: Now I know Y.
Action: ...
With tool-use API: handled implicitly.
Multi-step planning
system = """
You are an agent that completes tasks step by step.
Plan first, then execute.
Output:
1. Brief plan
2. First step (tool call)
"""
For complex tasks: have agent output a plan, then execute steps.
Memory
Short-term (context window)
Just include in messages. Truncate / summarize when long.
Long-term
Vector DB:
def remember(text):
vector_db.insert(embedding=embed(text), payload={"text": text, "ts": now()})
def recall(query, k=5):
return vector_db.search(embed(query), k=k)
Tool the LLM can call.
Working memory (scratchpad)
A file the agent reads/writes:
tools = [
{"name": "read_scratchpad", ...},
{"name": "write_scratchpad", ...},
]
Reflection
After completing, ask LLM to critique its own work:
critique = llm(f"Original task: {task}\nResponse: {response}\n\nCritique the response and suggest improvements.")
revised = llm(f"Original: {response}\nCritique: {critique}\n\nRevised response:")
Sub-agents (delegation)
Main agent has a delegate_task(description, agent_type) tool. Sub-agents have narrower scope, separate context.
Good for: long-horizon tasks where main context would fill up.
Multi-agent (CrewAI / AutoGen / LangGraph)
# CrewAI example
from crewai import Agent, Task, Crew
researcher = Agent(role="Researcher", goal="Find info", backstory="...", tools=[search])
writer = Agent(role="Writer", goal="Write article", backstory="...", tools=[])
t1 = Task(description="Research X", agent=researcher)
t2 = Task(description="Write article from research", agent=writer, context=[t1])
crew = Crew(agents=[researcher, writer], tasks=[t1, t2])
result = crew.kickoff()
Often more complexity than needed. Start with one agent + tools.
LangGraph (stateful workflows)
from langgraph.graph import StateGraph
graph = StateGraph(State)
graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("review", review_node)
graph.add_edge("plan", "execute")
graph.add_conditional_edges("execute", router_fn)
graph.set_entry_point("plan")
app = graph.compile()
For complex workflows with explicit state.
Cost control
total_tokens = 0
max_tokens_per_run = 100_000
for i in range(max_iter):
response = llm(...)
total_tokens += response.usage.total
if total_tokens > max_tokens_per_run:
raise RuntimeError("Budget exceeded")
Time limits
deadline = time.time() + 300
while time.time() < deadline:
...
Determinism / testing
Hard to test. Strategies:
- Mock LLM responses.
- Run with
temperature=0and seed (best-effort). - Snapshot tests on tool call sequences.
- Eval suite: input → expected outcome.
Safety
Agents that run code / browse web / send messages need:
- Sandboxing (Docker, gVisor).
- Permission checks per tool.
- Human-in-the-loop for risky actions.
- Audit log of all actions.
Frameworks (use sparingly)
- LangChain / LangGraph: most popular, big.
- LlamaIndex: RAG-focused.
- CrewAI: multi-agent.
- AutoGen: multi-agent.
- Pydantic AI: type-safe.
- Anthropic SDK + custom loop: simplest, most control.
For most: build your own thin loop.
Streaming
Stream agent’s tool calls + final answer to UI:
# Tool call announcement: "I'm searching for X..."
# Tool result: streamed back
# Final answer: streamed text
Common mistakes
- No iteration cap → infinite loop on bad task.
- No cost cap → runaway bill.
- Mixing chat history with tool messages incorrectly.
- Tools that block long → request timeout.
- No human approval for destructive actions.
Read this next
If you want my agent template (loop + tools + memory), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .