Tool use is how LLMs reach beyond text. Done right, agents become useful — they fetch real data, call your APIs, write files. Done wrong, you get hallucinated tool calls, validation hell, and infinite loops. This post is the working set.
Tool definition basics
tools = [{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
},
"required": ["city"]
}
}]
The description is the prompt. Be precise about behavior, edge cases, what NOT to use it for.
Schema design
{
"name": "search_products",
"description": "Search the product catalog. Use this when the user mentions a product type or name. NOT for category browsing — use list_categories for that.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query, max 100 chars"},
"limit": {"type": "integer", "minimum": 1, "maximum": 50, "default": 10},
"category": {"type": "string", "enum": ["electronics", "books", "clothing"]},
},
"required": ["query"]
}
}
Constraints in the schema (enum, minimum, etc.) reduce LLM mistakes. The description tells the model when to choose this tool over others.
The loop
async def run(messages, tools, max_iters=15):
for _ in range(max_iters):
resp = await client.messages.create(
model="claude-sonnet-4-6",
messages=messages,
tools=tools,
max_tokens=4096,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
return resp
# Process tool calls
results = []
for block in resp.content:
if block.type == "tool_use":
try:
result = await dispatch(block.name, block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result),
})
except Exception as e:
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Error: {e}",
"is_error": True,
})
messages.append({"role": "user", "content": results})
raise MaxItersReached()
See LLM Agent Frameworks .
Parallel tool calls
Modern models call multiple tools in one turn:
# resp.content has multiple tool_use blocks
[
{"type": "tool_use", "id": "1", "name": "get_user", "input": {"id": 42}},
{"type": "tool_use", "id": "2", "name": "get_orders", "input": {"user_id": 42}},
]
Run them concurrently:
results = await asyncio.gather(*[
dispatch(block.name, block.input)
for block in resp.content if block.type == "tool_use"
])
For independent reads: massive latency win. For dependent ops, the model usually serializes naturally.
Validation
async def dispatch(name, args):
schema = TOOL_SCHEMAS[name]
try:
validated = schema.model_validate(args)
except ValidationError as e:
return {"error": f"Invalid arguments: {e}"}
return await TOOL_FNS[name](validated)
Validate at the boundary. Models occasionally hallucinate fields or wrong types — let validation reject them, return the error, let the model retry.
See Structured Output .
Error handling
# Bad: raise; loop crashes
result = await tool(args)
# Good: return as data; model decides
try:
result = await tool(args)
except NotFoundError:
return {"error": "not found"}
except RateLimitError:
return {"error": "rate limited; try again later"}
except Exception as e:
log.exception("tool failed")
return {"error": "internal error"}
The model recovers gracefully from tool errors when given as result content. Crashing the loop loses progress.
Tool naming and grouping
get_user
get_user_orders
get_user_settings
update_user_email
Consistent prefixes; clear actions. The model learns patterns from naming.
For larger surfaces (20+ tools), namespace:
db.user.get
db.user.update
http.get
fs.read
Tool routing
For huge tool catalogs:
# Step 1: ask LLM which "category" of tools it needs
category = await classify(user_query, ["users", "orders", "products", "support"])
# Step 2: only expose those tools
tools = TOOLS_BY_CATEGORY[category]
Cuts context usage; model has fewer choices.
Tool result formatting
# Bad: dump raw JSON of 1000 records
return json.dumps(huge_response)
# Good: shape for LLM consumption
return {
"summary": f"Found {len(results)} matching items",
"items": results[:10], # first 10
"total": len(results),
"more_available": len(results) > 10,
}
Trim. Summarize. Hint the model when there’s more.
For huge results: store and return a token:
async def search(...):
handle = await store_results_in_cache(big_results)
return {"handle": handle, "preview": big_results[:5], "total": len(big_results)}
# Model can call get_more(handle, offset)
Streaming with tools
async with client.messages.stream(...) as stream:
async for event in stream:
if event.type == "content_block_start" and event.content_block.type == "tool_use":
# Tool call coming
pass
# ... handle text + tool_use blocks
Mostly the loop is the same; you can show “thinking…” then “calling get_user()…” for UX.
State across iterations
Some tools need state (a session token, a connection):
class ToolContext:
def __init__(self, user, db, http):
self.user = user
self.db = db
self.http = http
self.cache = {}
async def dispatch(ctx, name, args):
return await TOOL_FNS[name](ctx, args)
Pass context to every tool call. Avoid global state.
Side-effecting tools
async def transfer_money(ctx, amount, to):
if not ctx.confirmation:
return {"awaiting_confirmation": True, "preview": f"Transfer ${amount} to {to}"}
if amount > 10000 and not ctx.user.is_admin:
return {"error": "amount too large for non-admin"}
# ... actually transfer ...
For dangerous operations: confirmation tokens, authorization checks, audit logs. The model can call; the system enforces. See LLM Guardrails .
Common mistakes
1. Vague descriptions
“Gets data.” About what? When? Spend time on descriptions; you save tokens AND failures.
2. Tool flood
50+ tools in the prompt. Token bloat; selection errors. Categorize and route.
3. Raising on tool errors
Loop crashes; user sees nothing. Always return errors as data.
4. Returning megabytes
LLM context blows up; cost spikes. Trim or paginate.
5. No max iters
Model loops forever calling the same tool. Always bound.
What I’d ship today
For an agent project:
- Pydantic schemas for all tool inputs.
- Structured tool definitions with rich descriptions.
- Parallel calls for independent ops.
- Errors as data, not exceptions.
- Bounded result sizes with pagination tokens.
- Authorization at tool layer, not in prompts.
- Tracing every tool call.
Read this next
- Designing Tools for AI Agents 2026
- LLM Agent Frameworks 2026
- LLM Agent Error Recovery 2026
- Structured Output for LLMs
If you want my tool schema library + validation harness, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .