The biggest factor in whether an agent uses your tools correctly isn’t the model — it’s the tool descriptions. After shipping a few agent products, the patterns are clear. This post is the working set.
Names matter
search is too vague. The agent doesn’t know what it searches.
search_orders_by_email is specific. The agent calls it confidently when the user asks about their orders.
Verb_object naming. Be specific. A search_X and search_Y tool pair works far better than one generic search.
Descriptions are prompts
{
name: "search_orders",
description: "Search the user's orders by email or order ID. Returns up to 25 most recent matches with id, total, status, and date. Use this when the user asks about their order history.",
input_schema: { ... }
}
Three things in one description:
- What it does: search orders by specific keys.
- Limits: returns up to 25.
- When to use it: when the user asks about orders.
The third is what most teams forget. It’s a usage hint that prevents misuse.
Input schemas — be explicit
{
type: "object",
properties: {
query: {
type: "string",
description: "Email address or order ID. Empty string returns the most recent orders."
},
limit: {
type: "integer",
minimum: 1,
maximum: 50,
default: 25,
description: "Number of orders to return."
},
status: {
type: "string",
enum: ["pending", "shipped", "delivered", "cancelled"],
description: "Filter by status. Omit to include all."
}
},
required: ["query"]
}
Every field has a description. Enums replace stringly-typed args where possible. Minimums/maximums clamp realistic ranges.
Output: text, not JSON
async def search_orders(query: str, limit: int = 25) -> list[TextContent]:
rows = await db.search(query, limit)
if not rows:
return [TextContent(type="text", text="No orders found.")]
text = "\n".join(
f"{r['id']} {r['email']} ${r['total']:.2f} {r['status']} {r['created_at']}"
for r in rows
)
return [TextContent(type="text", text=text)]
Pre-formatted text is what the model reads and synthesizes. JSON forces it to parse twice (read + reformat for the user). For human-facing agents, text wins.
Error returns
async def search_orders(query: str) -> list[TextContent]:
if not query.strip():
return [TextContent(
type="text",
text="Query is empty. Provide an email address or order ID."
)]
try:
rows = await db.search(query, 25)
except DatabaseUnavailable:
return [TextContent(
type="text",
text="Database is temporarily unavailable. Try again in a moment."
)]
# ...
Helpful errors lead to graceful agent behavior. The model reads the error, knows what to do next (retry, ask user, give up cleanly).
For native MCP error semantics see Model Context Protocol Explained .
Idempotency
Tools that mutate state should be safe to call twice:
async def create_order(idempotency_key: str, items: list[Item]):
existing = await db.get_by_key(idempotency_key)
if existing:
return f"Order {existing.id} already exists for this key."
new_order = await db.create(items, key=idempotency_key)
return f"Created order {new_order.id}."
Agents retry. Without idempotency, retries become double-orders. See Idempotency, Retries, and Exactly-Once Illusions .
Hints, not hand-holding
Found 3 matching orders. Use get_order_details with the order ID for more.
A hint at the end of a response nudges the agent to the next useful action. The model picks up on it; user gets richer responses.
Versioning tools
A tool’s behavior changes; the agent’s “memory” of the description doesn’t. To migrate:
- Add the new tool with a new name (
search_orders_v2). - Mark the old as
deprecated: do not use; prefer search_orders_v2. - Eventually remove.
Don’t silently change a tool’s contract. Agents that learned the old shape will misbehave.
Cardinality of tools
Five focused tools beat fifty generic ones. The model picks better when the option set is small and meaningful.
If you have 20 tools, ask: which 8 are most-used? Promote those. Hide the others behind a generic “advanced action” tool that takes a free-form description.
Common mistakes
1. Tool names that overlap
search and find and lookup — the agent picks randomly. Pick one verb per concept.
2. Descriptions copy-pasted from API docs
API docs are written for human developers. Tool descriptions are written for the model. Different audiences.
3. Returning huge JSON
100-row response × verbose JSON = burnt context. Truncate, summarize, paginate.
4. Tools that mutate without confirmation
A delete_account tool that just deletes is dangerous. Add confirmation gates or human-in-the-loop (LLM Security
).
5. No observability
When agents misuse tools, you need to see what was called with what args. Log every tool invocation. See LLM Observability .
Read this next
- Model Context Protocol (MCP) Explained
- Build an MCP Server for Your SaaS
- AI Agents with LangGraph in 2026
- Structured Output for LLMs
If you want my tool-design template + eval harness, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .