What's the difference between RAG and agentic RAG?

Naive RAG retrieves once at the start with the user's query. Agentic RAG lets the LLM decide what to retrieve, when, and with which sub-questions — typically as tool calls. Slower per query, but answers harder questions correctly.

When should I use agentic RAG?

When questions need multi-document reasoning, comparisons, or when the user's question doesn't map to a single retrievable chunk. For 'tell me about X' lookups, naive RAG works fine.

How much more expensive is agentic RAG?

2–5× per query in tokens (multiple LLM calls + multiple retrievals). Worth it for hard questions where accuracy matters more than per-call cost.

Agentic RAG in 2026 — When Retrieval Becomes a Tool, Not a Pipeline

Naive RAG works on simple questions: one query, one retrieval, one answer. The moment a real user asks “compare how the docs say to handle errors versus what the code actually does,” it falls apart. Agentic RAG is the response — let the LLM drive retrieval as a tool, with multi-step reasoning, decomposition, and self-reflection.

What changes

	Naive RAG	Agentic RAG
Retrievals per query	1	1–10
LLM calls per query	1	2–10
Decision-maker	Pipeline	LLM
Best for	Direct lookups	Multi-step, comparative
Cost	1×	2–5×
Latency	1×	3–6×

The cost increase buys you handling questions naive RAG simply gets wrong.

The shape

User question
  ↓
LLM with retrieve() tool
  ↓
Plan / decompose
  ↓
Loop: retrieve(sub-q) → think → retrieve(next) ...
  ↓
Synthesize from results
  ↓
Self-reflect: did I answer fully?
  ↓
Final answer with citations

The LLM is the orchestrator. Retrieval is one tool among others.

LangGraph implementation

from langchain_core.tools import tool
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode

@tool
async def retrieve(query: str, k: int = 5) -> str:
    docs = await pgvector_search(query, k=k)
    return "\n\n".join(f"[{d.id}] {d.content}" for d in docs)

@tool
async def web_search(query: str) -> str:
    return await tavily_search(query, max_results=5)


tools = [retrieve, web_search]
llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0).bind_tools(tools)


def call_model(state):
    msg = llm.invoke(state["messages"])
    return {"messages": [msg]}


graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.add_node("tools", ToolNode(tools))
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", tools_condition, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")
agent = graph.compile()

The agent decides whether to retrieve, what to retrieve, and when it has enough. See AI Agents with LangGraph for the LangGraph mechanics.

Patterns that ship

1. Query rewriting

The user’s “what about that bug from yesterday?” doesn’t retrieve well alone. The agent rewrites with conversation context.

2. Decomposition

The user asks “how do A, B, and C compare?” Agent issues three retrievals, one per sub-question, then synthesizes.

3. Self-reflection

After answering, the agent asks itself: “Did I cover everything?” If no, retrieve more. Add to system prompt:

After your initial answer, check whether the user’s question is fully addressed. If not, identify what’s missing and call retrieve() for it.

4. Citations

Every claim cites the chunk that supported it. The model emits [id] markers; the wrapper looks up URLs server-side. See Build a RAG App with pgvector .

When NOT to use agentic RAG

Single-fact questions. Naive RAG is cheaper.
Latency budget < 1s. Agentic loops blow that.
Strict per-query cost cap.

A pragmatic 2026 setup: route each query — easy questions get naive RAG; hard ones (detected by classifier or self-signal) get agentic.

Cost discipline

Bound the loop:

Max retrievals per query (e.g., 5).
Max LLM calls per query (e.g., 8).
Hard timeout.

Without bounds, an off-by-one in your prompt becomes a $50 question.

For cost tactics see LLM Cost Optimization in 2026 .

Read this next

If you want a working LangGraph + pgvector + reranker agentic-RAG starter, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

What changes#

The shape#

LangGraph implementation#

Patterns that ship#

1. Query rewriting#

2. Decomposition#

3. Self-reflection#

4. Citations#

When NOT to use agentic RAG#

Cost discipline#

Read this next#