How to Choose Between Claude Haiku, Sonnet, and Opus (2026)

The default pick for production AI work in 2026 is Claude Sonnet 4.6, it handles 90% of real tasks well, costs about three-fifths of what Opus does, and is noticeably faster on the same prompt. Haiku 4.5 is the right pick when you have high volume and forgiving accuracy requirements (classification, extraction, summarisation of short text). Opus 4.8 is the right pick when you need the best possible reasoning: multi-step planning, complex code refactors, anything where Sonnet has been visibly struggling (it is the current top Opus tier; Opus 4.7 is the prior generation, same price and context window). I'll walk the cost math, the latency profile, the capability gaps, and a concrete decision matrix for routing prompts across the three tiers.

The wrong question is "which model is the best." The right question is "what's the cheapest tier that gets this specific task done well enough." A production AI app that runs everything on Opus pays roughly 1.7× a pure-Sonnet bill (and far more than a routed one) for a quality lift you probably can't measure on most tasks. An app that runs everything on Haiku makes mistakes the user notices. The win is routing.

Jump to:

The current pricing (mid-2026)
Latency profile per tier
Capability gaps: where each tier wins
The default: start with Sonnet
When to drop to Haiku
When to step up to Opus
Decision matrix
Cost math: routing across tiers
FAQ

The current pricing (mid-2026)

Model	Input per million	Output per million	Cache hit (10%)	Notes
Claude Haiku 4.5	$1.00	$5.00	$0.10	Fastest, cheapest, decent reasoning
Claude Sonnet 4.6	$3.00	$15.00	$0.30	The workhorse, default pick
Claude Opus 4.8	$5.00	$25.00	$0.50	Best reasoning, slowest, most expensive (current top Opus)
Claude Opus 4.7	$5.00	$25.00	$0.50	Prior-generation Opus, same price and 1M context as 4.8

Cache writes cost 1.25× input for the 5-minute TTL, 2× for the 1-hour TTL. Prompt caching is supported on all current Claude API models, covered in How to Cut LLM API Costs with Prompt Caching.

For generation-heavy workloads, output cost is where the budget goes, a typical content-generation task has a 5:1 to 10:1 output-to-input token ratio (lower for classification and extraction, where input dominates). Opus output costs $25 per million tokens versus Sonnet's $15, about 1.7×. Blanket Opus usage is rarely the right call.

Latency profile per tier

Time-to-first-token (TTFT) and tokens-per-second (TPS) vary by region, traffic, prompt size, output length, and service tier, Anthropic does not publish official figures. The numbers below are approximate, observed at typical request sizes. Treat the ranking as solid and the exact values as ballpark:

Model	TTFT	TPS (output)	Typical 500-token response
Haiku 4.5	~250ms	~140	~3.8 sec
Sonnet 4.6	~400ms	~75	~7 sec
Opus 4.8	~700ms	~45	~12 sec

For interactive UX where the user is watching tokens stream, Haiku feels instant, Sonnet feels responsive, Opus feels thoughtful (read: slow). For background workflows where latency isn't user-visible, this doesn't matter. For real-time chat, it matters a lot.

Capability gaps: where each tier wins

After a year of running every kind of prompt across all three tiers in production, the practical capability map:

Haiku 4.5 is reliably good at:

Classification (sentiment, topic, intent)
Field extraction from structured documents
Short summaries (paragraph → 1-2 sentences)
Yes/no questions with a reasonably worded context
Simple translation
Code completion within a single function

Haiku 4.5 visibly struggles with:

Multi-step reasoning (more than 2-3 steps)
Long-context synthesis (>10 documents)
Hard refactors across many files
Anything that needs "thinking carefully"

Sonnet 4.6 is the workhorse:

All of Haiku's strengths, plus
Multi-step reasoning up to ~7 steps
Long-context analysis well past 200K tokens, Sonnet 4.6 ships a 1M-token window
Code generation that runs on the first or second try
Most agentic workflows
Most chatbot use cases

Sonnet 4.6 visibly struggles with:

Deeply nested logical reasoning ("if A then B, but only if not C, unless D, in which case E")
Novel mathematical proofs
Very long-horizon planning (15+ step plans)
Code refactors that touch architectural concerns

Opus 4.8 wins on:

The hard reasoning tasks above
Deep code refactors with cross-cutting concerns
Plans that need to be right because each step is expensive
Novel problem-solving where there is no obvious template

The default: start with Sonnet

For any new endpoint, prompt, or pipeline: start with Sonnet 4.6. Measure quality on a representative sample. Then ask:

Is the quality acceptable? → keep Sonnet.
Is the quality high-but-overkill (latency or cost matter more)? → try Haiku.
Is the quality not good enough? → try Opus.

You'll find that for ~70% of endpoints Sonnet is the right answer. You'll move 20% down to Haiku for cost reasons, and 10% up to Opus for quality reasons. That distribution is the win.

Don't pick the tier based on intuition, measure. Run the same 50 prompts through each tier, score the outputs against a ground-truth or LLM-as-judge, and pick on data. Building an eval suite for this is covered in How to Write LLM Evals That Catch Regressions.

When to drop to Haiku

Drop to Haiku when:

The task is structural classification. "Is this a refund request, a billing question, or a feature request?", Haiku handles this perfectly at a third of Sonnet's cost.
You're extracting fields from a structured source. "Pull the order ID and the total from this email", Haiku reads this fine.
You have high volume. Anything running thousands of times an hour is worth a Haiku eval just to see if it survives the downgrade. A two-thirds cost cut at high volume is real money.
Latency is user-facing. Streaming UX where the user is watching tokens feels noticeably better on Haiku.

Don't drop to Haiku when:

The task involves multi-step reasoning.
The task involves long context (>50K tokens with detail to track).
The task is novel, Haiku is great at tasks it has seen, weaker at unusual ones.

When to step up to Opus

Step up to Opus when:

Sonnet is producing wrong answers on the same prompt repeatedly. Not "slightly off", visibly wrong. Run the same prompt 5 times. If 3+ outputs are wrong, Sonnet is at its limit.
The cost of a wrong answer is high. Financial decisions, medical summaries, legal drafting, anywhere a mistake is expensive to fix.
The task requires architectural thinking. Multi-file code refactors where the model needs to understand cross-cutting concerns.
The plan is long and each step is expensive. Agentic workflows where running the wrong tool costs you minutes or dollars per step. The cost of Opus is small compared to the cost of running 12 bad tool calls.

Opus 4.8 vs Opus 4.7: for any new work, reach for Opus 4.8, the current flagship and the most capable reasoning tier. The pricing and 1M context window are identical to 4.7, so there is no cost reason to stay on the older model. Pin Opus 4.7 only when you have already validated a pipeline against it and want to freeze the model for reproducibility.

Don't step up to Opus when:

The task is high-volume and quality is "good enough" on Sonnet. The Opus cost adds up fast.
The task is latency-sensitive. Opus is noticeably slower.

Decision matrix

Task pattern	Recommended	Why
Classify support tickets	Haiku	Structural, high volume, forgiving
Extract fields from invoices	Haiku	Structured, repetitive
Summarise a long PDF	Sonnet	Long context, reasoning-light
Chat with a knowledge base (RAG)	Sonnet	The default; faster than Opus matters in chat
Generate marketing copy	Sonnet	Quality matters but Opus is overkill
Write a 5-file refactor	Sonnet first, Opus if it fails	Try the cheap one first
Plan a multi-step agent	Opus	Plans are expensive to redo
Critique another LLM's output	Opus	LLM-as-judge benefits from the strongest reasoner
Translate a sentence	Haiku	Simple, fast
Decide if user input contains PII	Haiku	Yes/no classification

Cost math: routing across tiers

Concrete example. A customer-support AI handles 100,000 queries a day. Assume each call is about 1K input tokens and 500 output tokens. The per-call cost on each tier:

Haiku 4.5, (1,000 × $1 + 500 × $5) / 1,000,000 = $0.0035 per call
Sonnet 4.6, (1,000 × $3 + 500 × $15) / 1,000,000 = $0.0105 per call
Opus 4.8, (1,000 × $5 + 500 × $25) / 1,000,000 = $0.0175 per call

Without routing, every query goes to Sonnet:

100,000 calls × $0.0105 = $1,050/day.

With routing, Haiku handles 60% (simple lookups, classification), Sonnet handles 35% (real questions), Opus handles 5% (escalations that need harder reasoning):

60,000 × $0.0035 (Haiku) = $210.00
35,000 × $0.0105 (Sonnet) = $367.50
5,000 × $0.0175 (Opus 4.8) = $87.50

Total: $665/day. That's a $385/day saving (~37%) just from routing, with arguably better outcomes on the Opus-routed escalations.

The routing logic itself can be a Haiku classifier ("which tier should handle this prompt?"). A short classification call costs about $0.0004, running it on all 100,000 calls adds roughly $40/day, dwarfed by the $385/day the routing saves.

What to do next

For the cost-optimisation companion technique that stacks with model routing:

How to Cut LLM API Costs with Prompt Caching, once you've picked the right tier, caching the stable prefix is the next 90% off.

For the evaluation infrastructure you need to actually pick a tier on data instead of vibes:

How to Write LLM Evals That Catch Regressions covers the test-suite pattern for measuring quality differences across tiers.

External reference: the Anthropic model documentation is the canonical source for current capabilities, context windows, and pricing.

FAQ

For reasoning-heavy tasks, yes. For simple classification, extraction, and short-summary work, the gap is small enough that Haiku is the right pick on cost alone. Run an eval on your actual task before assuming Sonnet is needed.

For latency-sensitive interactive flows, Haiku's faster tokens-per-second is a real UX benefit that overrides minor quality differences.

When Sonnet visibly fails the task on repeated tries. Run the same prompt 5 times, if 3+ outputs are wrong, you've hit Sonnet's limit. Opus handles harder reasoning, multi-file code refactors, and long-horizon plans more reliably.

Don't default to Opus. Its output runs $25 per million tokens versus Sonnet's $15, about 1.7×, the latency is noticeably higher, and the Opus 4.8 tokenizer (shared with 4.7) can use more tokens for the same text. Reserve it for the tasks that actually need it.

Input and output pricing scale the same way. Haiku 4.5 is $1 / $5 per million tokens (input / output), Sonnet 4.6 is $3 / $15, and Opus 4.8 is $5 / $25 (Opus 4.7 is priced identically). So Sonnet is 3× the price of Haiku, Opus is 5× Haiku, and Opus is about 1.7× Sonnet.

Output dominates the bill in generation-heavy workflows (long-form writing, code generation). Input dominates in RAG, extraction, classification, and long-context analysis, where you feed the model far more than it returns.

Yes, a Haiku classifier as a router is the standard pattern. The classifier reads the user input and outputs which tier should handle it: "simple lookup" → Haiku, "needs reasoning" → Sonnet, "hard plan" → Opus.

The router itself is a cheap Haiku call (about $0.0004 per route). The routing savings on the downstream calls typically pay for the router around 10× over.

Yes. As of 2026, Claude Sonnet 4.6 and Opus 4.8 (and Opus 4.7) ship a 1M-token context window at standard pricing. Haiku 4.5 is capped at 200K tokens.

Beyond raw window size, the practical difference is how well each model reasons across a long context. Sonnet and Opus hold detail across hundreds of thousands of tokens reliably; Haiku starts to degrade past ~50K tokens of dense content.

How to Choose Between Claude Haiku, Sonnet, and Opus

The current pricing (mid-2026)

Latency profile per tier

Capability gaps: where each tier wins

The default: start with Sonnet

When to drop to Haiku

When to step up to Opus

Decision matrix

Cost math: routing across tiers

What to do next

FAQ

See also

Sources

Ishan Karunaratne

Related posts

How to Use Regex Lookaheads and Lookbehinds

How to Convert Images to WebP (and AVIF) From the Command Line

How to Change File Owner and Group on Linux (chown)

Is Claude Sonnet 4.6 always better than Haiku 4.5?

When should I use Claude Opus instead of Sonnet?

What's the cost difference between Haiku, Sonnet, and Opus?

Can I route prompts across tiers automatically?

Does context length differ between Haiku, Sonnet, and Opus?

Sources

Ishan Karunaratne