Should I use a framework or build it myself?

For simple single-agent loops: build it. ~200 lines of Python is clearer than wrestling a framework. For multi-agent, complex state machines, or teams that already use LangChain: framework saves real time.

LangGraph for explicit graphs of state transitions (control). CrewAI for role-based multi-agent (declarative). Most teams default to LangGraph because it's lower-level and gives more control.

LLM Agent Frameworks in 2026 — LangGraph, CrewAI, and the Bare-Metal Alternative

Agent frameworks proliferated in 2024-2025; by 2026 the landscape clarified. Some are genuinely useful; others are abstraction tax. This post is the honest comparison.

The frameworks

	Strengths	Weaknesses
LangGraph	Explicit state machine; observable; mature	Verbose; LangChain ecosystem
CrewAI	Role-based multi-agent; declarative	Opinionated; harder to escape
OpenAI Agents SDK	Simple; OpenAI-blessed	OpenAI-centric
AutoGen	Multi-agent conversations; research-y	Microsoft; less production-tested
Pydantic AI	Type-safe; Python-first; clean	Newer; smaller ecosystem
Bare-metal Python	Total control; debuggable	You build it

Bare-metal first

Most “agent” needs are a tool-calling loop:

async def run_agent(messages, tools, max_iters=10):
    for _ in range(max_iters):
        resp = await client.messages.create(
            model="claude-sonnet-4-6",
            messages=messages,
            tools=tools,
            max_tokens=4096,
        )
        if resp.stop_reason == "end_turn":
            return resp
        
        messages.append({"role": "assistant", "content": resp.content})
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = await dispatch(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result),
                })
        messages.append({"role": "user", "content": tool_results})
    
    raise MaxItersReached()

That’s it. 25 lines. For most agent use cases this is the right starting point. Add observability, error handling, persistence as you need them.

See LLM Agent Error Recovery and Agent Tool Design .

When a framework helps

Complex state machines: many states, conditional transitions, fanout.
Multi-agent: agents collaborating with distinct roles.
Persistence: durable across crashes / restarts (workflow engines beat agent frameworks here, see Temporal ).
Standardized tracing / debugging.

LangGraph

from langgraph.graph import StateGraph, END

def classify(state):
    state["category"] = classify_query(state["question"])
    return state

def search(state):
    state["results"] = search_kb(state["question"])
    return state

def respond(state):
    state["response"] = generate(state["question"], state["results"])
    return state

graph = StateGraph(AgentState)
graph.add_node("classify", classify)
graph.add_node("search", search)
graph.add_node("respond", respond)

graph.set_entry_point("classify")
graph.add_conditional_edges("classify", lambda s: 
    "search" if s["category"] == "factual" else "respond"
)
graph.add_edge("search", "respond")
graph.add_edge("respond", END)

app = graph.compile()
result = await app.ainvoke({"question": "..."})

Explicit nodes; explicit transitions. LangSmith integration for tracing. Good when the flow is non-linear.

CrewAI

from crewai import Agent, Task, Crew

researcher = Agent(role="Researcher", goal="Find facts", backstory="...")
writer = Agent(role="Writer", goal="Write draft", backstory="...")
editor = Agent(role="Editor", goal="Polish", backstory="...")

task1 = Task(description="Research X", agent=researcher)
task2 = Task(description="Write draft on X", agent=writer)
task3 = Task(description="Edit draft", agent=editor)

crew = Crew(agents=[researcher, writer, editor], tasks=[task1, task2, task3])
result = crew.kickoff()

Role-based; declarative; debugging is harder because the orchestration is opaque. For well-defined multi-step workflows with clear roles.

OpenAI Agents SDK

from agents import Agent, Runner

agent = Agent(
    name="Support",
    instructions="Help with billing questions",
    tools=[lookup_invoice, refund],
)

result = await Runner.run(agent, "Why was I charged twice?")

Simple. Tied to OpenAI but works with other providers via adapters. Best for OpenAI-first shops.

Pydantic AI

from pydantic_ai import Agent

agent = Agent(
    "claude-sonnet-4-6",
    deps_type=Database,
    result_type=Order,
)

@agent.tool
async def find_order(ctx, order_id: int) -> Order:
    return await ctx.deps.get_order(order_id)

result = await agent.run("Find order 123", deps=db)

Type-safe. Clean. Good fit for Python teams that want structure without LangChain’s surface area.

Choosing

Need	Pick
Simple single-agent loop	Bare metal
Type-safe + clean	Pydantic AI
Complex state machine	LangGraph
Multi-agent roles	CrewAI
OpenAI-only	OpenAI Agents SDK
Durable, long-running	Temporal + agents

For 70% of use cases I’ve seen: bare metal or Pydantic AI is the right answer.

What frameworks add

Tracing integration (LangSmith, etc.).
Tool registries.
Multi-agent orchestration.
State persistence.

What they cost:

Learning curve.
Lock-in.
Abstraction tax (debugging through layers).
Slower iteration (their patterns vs your code).

Anti-patterns

1. Framework for a 50-line script

Loading LangChain to call an LLM with one tool. The framework dwarfs the work.

2. Multi-agent without need

“Three agents collaborate” sounds cool. Often: one agent with three tools is clearer and cheaper.

3. Locked in to LangChain everywhere

Every component is a langchain class. Migrating costs months. Use frameworks at boundaries; not everywhere.

4. No tracing

Whichever framework, instrument. See LLM Observability .

5. Ignoring durability

Long-running agent crashes; loses state. For anything important: persist state or use a workflow engine.

What I’d ship today

For new agent projects:

Start bare-metal: a tool-calling loop in 200 lines.
Add tracing (Langfuse / OTEL).
Add structured-output validation.
If state grows complex: LangGraph or Pydantic AI.
If multi-agent emerges naturally: CrewAI or hand-rolled coordinator.
For long-running / durable: Temporal + your agent code.

Avoid: starting with a framework “because everyone does.”

Read this next

If you want my bare-metal agent loop + tracing starter, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

The frameworks#

Bare-metal first#

When a framework helps#

LangGraph#

CrewAI#

OpenAI Agents SDK#

Pydantic AI#

Choosing#

What frameworks add#

Anti-patterns#

1. Framework for a 50-line script#

2. Multi-agent without need#

3. Locked in to LangChain everywhere#

4. No tracing#

5. Ignoring durability#

What I’d ship today#

Read this next#