Agent frameworks proliferated in 2024-2025; by 2026 the landscape clarified. Some are genuinely useful; others are abstraction tax. This post is the honest comparison.
The frameworks
| Strengths | Weaknesses | |
|---|---|---|
| LangGraph | Explicit state machine; observable; mature | Verbose; LangChain ecosystem |
| CrewAI | Role-based multi-agent; declarative | Opinionated; harder to escape |
| OpenAI Agents SDK | Simple; OpenAI-blessed | OpenAI-centric |
| AutoGen | Multi-agent conversations; research-y | Microsoft; less production-tested |
| Pydantic AI | Type-safe; Python-first; clean | Newer; smaller ecosystem |
| Bare-metal Python | Total control; debuggable | You build it |
Bare-metal first
Most “agent” needs are a tool-calling loop:
async def run_agent(messages, tools, max_iters=10):
for _ in range(max_iters):
resp = await client.messages.create(
model="claude-sonnet-4-6",
messages=messages,
tools=tools,
max_tokens=4096,
)
if resp.stop_reason == "end_turn":
return resp
messages.append({"role": "assistant", "content": resp.content})
tool_results = []
for block in resp.content:
if block.type == "tool_use":
result = await dispatch(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result),
})
messages.append({"role": "user", "content": tool_results})
raise MaxItersReached()
That’s it. 25 lines. For most agent use cases this is the right starting point. Add observability, error handling, persistence as you need them.
See LLM Agent Error Recovery and Agent Tool Design .
When a framework helps
- Complex state machines: many states, conditional transitions, fanout.
- Multi-agent: agents collaborating with distinct roles.
- Persistence: durable across crashes / restarts (workflow engines beat agent frameworks here, see Temporal ).
- Standardized tracing / debugging.
LangGraph
from langgraph.graph import StateGraph, END
def classify(state):
state["category"] = classify_query(state["question"])
return state
def search(state):
state["results"] = search_kb(state["question"])
return state
def respond(state):
state["response"] = generate(state["question"], state["results"])
return state
graph = StateGraph(AgentState)
graph.add_node("classify", classify)
graph.add_node("search", search)
graph.add_node("respond", respond)
graph.set_entry_point("classify")
graph.add_conditional_edges("classify", lambda s:
"search" if s["category"] == "factual" else "respond"
)
graph.add_edge("search", "respond")
graph.add_edge("respond", END)
app = graph.compile()
result = await app.ainvoke({"question": "..."})
Explicit nodes; explicit transitions. LangSmith integration for tracing. Good when the flow is non-linear.
CrewAI
from crewai import Agent, Task, Crew
researcher = Agent(role="Researcher", goal="Find facts", backstory="...")
writer = Agent(role="Writer", goal="Write draft", backstory="...")
editor = Agent(role="Editor", goal="Polish", backstory="...")
task1 = Task(description="Research X", agent=researcher)
task2 = Task(description="Write draft on X", agent=writer)
task3 = Task(description="Edit draft", agent=editor)
crew = Crew(agents=[researcher, writer, editor], tasks=[task1, task2, task3])
result = crew.kickoff()
Role-based; declarative; debugging is harder because the orchestration is opaque. For well-defined multi-step workflows with clear roles.
OpenAI Agents SDK
from agents import Agent, Runner
agent = Agent(
name="Support",
instructions="Help with billing questions",
tools=[lookup_invoice, refund],
)
result = await Runner.run(agent, "Why was I charged twice?")
Simple. Tied to OpenAI but works with other providers via adapters. Best for OpenAI-first shops.
Pydantic AI
from pydantic_ai import Agent
agent = Agent(
"claude-sonnet-4-6",
deps_type=Database,
result_type=Order,
)
@agent.tool
async def find_order(ctx, order_id: int) -> Order:
return await ctx.deps.get_order(order_id)
result = await agent.run("Find order 123", deps=db)
Type-safe. Clean. Good fit for Python teams that want structure without LangChain’s surface area.
Choosing
| Need | Pick |
|---|---|
| Simple single-agent loop | Bare metal |
| Type-safe + clean | Pydantic AI |
| Complex state machine | LangGraph |
| Multi-agent roles | CrewAI |
| OpenAI-only | OpenAI Agents SDK |
| Durable, long-running | Temporal + agents |
For 70% of use cases I’ve seen: bare metal or Pydantic AI is the right answer.
What frameworks add
- Tracing integration (LangSmith, etc.).
- Tool registries.
- Multi-agent orchestration.
- State persistence.
What they cost:
- Learning curve.
- Lock-in.
- Abstraction tax (debugging through layers).
- Slower iteration (their patterns vs your code).
Anti-patterns
1. Framework for a 50-line script
Loading LangChain to call an LLM with one tool. The framework dwarfs the work.
2. Multi-agent without need
“Three agents collaborate” sounds cool. Often: one agent with three tools is clearer and cheaper.
3. Locked in to LangChain everywhere
Every component is a langchain class. Migrating costs months. Use frameworks at boundaries; not everywhere.
4. No tracing
Whichever framework, instrument. See LLM Observability .
5. Ignoring durability
Long-running agent crashes; loses state. For anything important: persist state or use a workflow engine.
What I’d ship today
For new agent projects:
- Start bare-metal: a tool-calling loop in 200 lines.
- Add tracing (Langfuse / OTEL).
- Add structured-output validation.
- If state grows complex: LangGraph or Pydantic AI.
- If multi-agent emerges naturally: CrewAI or hand-rolled coordinator.
- For long-running / durable: Temporal + your agent code.
Avoid: starting with a framework “because everyone does.”
Read this next
- Designing Tools for AI Agents 2026
- LLM Agent Error Recovery 2026
- Agent Memory Systems 2026
- LLM Observability 2026
If you want my bare-metal agent loop + tracing starter, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .