The “which LLM provider?” question is hot in every kickoff. The honest answer in 2026: pick by workload, hedge with a gateway, expect to use multiple. This post is the practical comparison.

Models worth knowing

ProviderBest for
Claude Opus 4.7AnthropicCoding, agents, reasoning, long context
Claude Sonnet 4.6AnthropicWorkhorse — RAG, tool use, chat
Claude Haiku 4.5AnthropicClassification, extraction, high-volume
GPT-5OpenAIMultimodal, tool use, frontier reasoning
GPT-5 miniOpenAIFast, cheap workhorse
GPT-5 nanoOpenAIClassification, embeddings, very high volume
Gemini 2.5 ProGoogleLong context (1M+), video, multimodal
Gemini 2.5 FlashGoogleFast, cheap, capable

These shift quarter-to-quarter. Run evals on your data; pick by results.

By workload

Coding agents

Anthropic Claude Opus 4.7 wins consistently. The agentic loops, tool calling, and code reasoning are noticeably better. Used by Cursor, Claude Code, Cognition, Codeium.

See Cursor vs Windsurf vs Claude Code .

RAG and knowledge Q&A

Claude Sonnet 4.6 or GPT-5 mini for cost-effectiveness; Opus 4.7 when reasoning over retrieved context matters. Both cache prompts well; both ship structured outputs.

Multimodal (images, video)

GPT-5 for general images. Gemini 2.5 Pro for video and very long documents.

Long context (>500k tokens)

Gemini 2.5 Pro (2M token context — though attention quality drops past 500k). Claude Opus 4.7 at 1M.

For 1M-token context patterns .

Voice agents

Realtime APIs from each provider are competitive. OpenAI’s Realtime is slightly more mature; Gemini Live wins on multimodal-with-video. See Voice Agents and Realtime LLM APIs .

Classification / extraction (high volume)

Claude Haiku 4.5 or GPT-5 nano. Both cheap; both do structured output well; pick by which is faster on your eval.

Edge / on-device

None. Use self-hosted Llama 3.3 / Qwen 2.5 .

Pricing (rough 2026)

Per 1M tokens (input / output):

InputOutput
Claude Opus 4.7$15$75
Claude Sonnet 4.6$3$15
Claude Haiku 4.5$1$5
GPT-5$20$80
GPT-5 mini$0.50$2
GPT-5 nano$0.10$0.40
Gemini 2.5 Pro$1.25$10
Gemini 2.5 Flash$0.30$2.50

With caching, divide input cost by ~10. See LLM Prompt Caching .

For volume cost optimization see LLM Cost Optimization and LLM Routing .

Latency

Order-of-magnitude p50 for short messages:

  • GPT-5 nano: 200–400ms.
  • Haiku 4.5: 250–500ms.
  • Gemini 2.5 Flash: 300–600ms.
  • Sonnet 4.6 / GPT-5 mini: 500–1000ms.
  • Opus 4.7 / GPT-5: 1–3s.

Streaming TTFT (time to first token) is what users feel.

Ecosystem

OpenAIAnthropicGoogle
SDKsExcellent across languagesExcellentGood
Tool useMatureMature, tightMature
VisionExcellentExcellentExcellent
Realtime APIMatureRolling outMature (Gemini Live)
Batch APIYes (50% off)Yes (50% off)Yes
CachingAutoExplicit markersExplicit
Fine-tuningYesLimitedYes
MCP supportIndirectFirst-partyIndirect

For MCP support Anthropic leads.

Don’t pick one — use a gateway

For production, sit behind an AI gateway :

  • Fallback when one provider has an outage.
  • Route by task (Haiku for classification, Opus for reasoning).
  • Cost tracking per feature.
  • Prompt caching across providers.

Single-provider apps are fragile.

What I’d ship today

For a new AI product:

  • Anthropic as primary (Sonnet for default, Opus for hard, Haiku for cheap).
  • OpenAI as fallback via LiteLLM .
  • Gemini when long context or video matters.
  • Self-hosted Llama for privacy / cost-driven workloads.

Run evals quarterly; rebalance as models shift.

Read this next

If you want my multi-provider eval harness comparing Claude / GPT / Gemini on your data, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .