TechEarl

How to Choose Between Claude Haiku, Sonnet, and Opus

Pick the right Claude tier for the job: Haiku for high-volume cheap, Sonnet for the default workhorse, Opus for hard reasoning. With cost math, latency, and a decision matrix.

Ishan Karunaratne⏱️ 12 min readUpdated
Share thisCopied
Pick the right Claude tier: Haiku for cheap volume, Sonnet for the default, Opus for hard reasoning. Cost math, latency, capability comparison, decision matrix.

The default pick for production AI work in 2026 is Claude Sonnet 4.6 — it handles 90% of real tasks well, costs about three-fifths of what Opus does, and is noticeably faster on the same prompt. Haiku 4.5 is the right pick when you have high volume and forgiving accuracy requirements (classification, extraction, summarisation of short text). Opus 4.7 is the right pick when you need the best possible reasoning — multi-step planning, complex code refactors, anything where Sonnet has been visibly struggling. I'll walk the cost math, the latency profile, the capability gaps, and a concrete decision matrix for routing prompts across the three tiers.

The wrong question is "which model is the best." The right question is "what's the cheapest tier that gets this specific task done well enough." A production AI app that runs everything on Opus pays roughly 1.7× a pure-Sonnet bill (and far more than a routed one) for a quality lift you probably can't measure on most tasks. An app that runs everything on Haiku makes mistakes the user notices. The win is routing.

Jump to:

The current pricing (mid-2026)

ModelInput per millionOutput per millionCache hit (10%)Notes
Claude Haiku 4.5$1.00$5.00$0.10Fastest, cheapest, decent reasoning
Claude Sonnet 4.6$3.00$15.00$0.30The workhorse — default pick
Claude Opus 4.7$5.00$25.00$0.50Best reasoning, slowest, most expensive

Cache writes cost 1.25× input for the 5-minute TTL, 2× for the 1-hour TTL. Prompt caching is supported on all current Claude API models — covered in How to Cut LLM API Costs with Prompt Caching.

For generation-heavy workloads, output cost is where the budget goes — a typical content-generation task has a 5:1 to 10:1 output-to-input token ratio (lower for classification and extraction, where input dominates). Opus output costs $25 per million tokens versus Sonnet's $15, about 1.7×. Blanket Opus usage is rarely the right call.

Latency profile per tier

Time-to-first-token (TTFT) and tokens-per-second (TPS) vary by region, traffic, prompt size, output length, and service tier — Anthropic does not publish official figures. The numbers below are approximate, observed at typical request sizes. Treat the ranking as solid and the exact values as ballpark:

ModelTTFTTPS (output)Typical 500-token response
Haiku 4.5~250ms~140~3.8 sec
Sonnet 4.6~400ms~75~7 sec
Opus 4.7~700ms~45~12 sec

For interactive UX where the user is watching tokens stream, Haiku feels instant, Sonnet feels responsive, Opus feels thoughtful (read: slow). For background workflows where latency isn't user-visible, this doesn't matter. For real-time chat, it matters a lot.

Capability gaps: where each tier wins

After a year of running every kind of prompt across all three tiers in production, the practical capability map:

Haiku 4.5 is reliably good at:

  • Classification (sentiment, topic, intent)
  • Field extraction from structured documents
  • Short summaries (paragraph → 1-2 sentences)
  • Yes/no questions with a reasonably worded context
  • Simple translation
  • Code completion within a single function

Haiku 4.5 visibly struggles with:

  • Multi-step reasoning (more than 2-3 steps)
  • Long-context synthesis (>10 documents)
  • Hard refactors across many files
  • Anything that needs "thinking carefully"

Sonnet 4.6 is the workhorse:

  • All of Haiku's strengths, plus
  • Multi-step reasoning up to ~7 steps
  • Long-context analysis well past 200K tokens — Sonnet 4.6 ships a 1M-token window
  • Code generation that runs on the first or second try
  • Most agentic workflows
  • Most chatbot use cases

Sonnet 4.6 visibly struggles with:

  • Deeply nested logical reasoning ("if A then B, but only if not C, unless D, in which case E")
  • Novel mathematical proofs
  • Very long-horizon planning (15+ step plans)
  • Code refactors that touch architectural concerns

Opus 4.7 wins on:

  • The hard reasoning tasks above
  • Deep code refactors with cross-cutting concerns
  • Plans that need to be right because each step is expensive
  • Novel problem-solving where there is no obvious template

The default: start with Sonnet

For any new endpoint, prompt, or pipeline: start with Sonnet 4.6. Measure quality on a representative sample. Then ask:

  1. Is the quality acceptable? → keep Sonnet.
  2. Is the quality high-but-overkill (latency or cost matter more)? → try Haiku.
  3. Is the quality not good enough? → try Opus.

You'll find that for ~70% of endpoints Sonnet is the right answer. You'll move 20% down to Haiku for cost reasons, and 10% up to Opus for quality reasons. That distribution is the win.

Don't pick the tier based on intuition — measure. Run the same 50 prompts through each tier, score the outputs against a ground-truth or LLM-as-judge, and pick on data. Building an eval suite for this is covered in How to Write LLM Evals That Catch Regressions.

When to drop to Haiku

Drop to Haiku when:

  • The task is structural classification. "Is this a refund request, a billing question, or a feature request?" — Haiku handles this perfectly at a third of Sonnet's cost.
  • You're extracting fields from a structured source. "Pull the order ID and the total from this email" — Haiku reads this fine.
  • You have high volume. Anything running thousands of times an hour is worth a Haiku eval just to see if it survives the downgrade. A two-thirds cost cut at high volume is real money.
  • Latency is user-facing. Streaming UX where the user is watching tokens feels noticeably better on Haiku.

Don't drop to Haiku when:

  • The task involves multi-step reasoning.
  • The task involves long context (>50K tokens with detail to track).
  • The task is novel — Haiku is great at tasks it has seen, weaker at unusual ones.

When to step up to Opus

Step up to Opus when:

  • Sonnet is producing wrong answers on the same prompt repeatedly. Not "slightly off" — visibly wrong. Run the same prompt 5 times. If 3+ outputs are wrong, Sonnet is at its limit.
  • The cost of a wrong answer is high. Financial decisions, medical summaries, legal drafting — anywhere a mistake is expensive to fix.
  • The task requires architectural thinking. Multi-file code refactors where the model needs to understand cross-cutting concerns.
  • The plan is long and each step is expensive. Agentic workflows where running the wrong tool costs you minutes or dollars per step. The cost of Opus is small compared to the cost of running 12 bad tool calls.

Don't step up to Opus when:

  • The task is high-volume and quality is "good enough" on Sonnet. The Opus cost adds up fast.
  • The task is latency-sensitive. Opus is noticeably slower.

Decision matrix

Task patternRecommendedWhy
Classify support ticketsHaikuStructural, high volume, forgiving
Extract fields from invoicesHaikuStructured, repetitive
Summarise a long PDFSonnetLong context, reasoning-light
Chat with a knowledge base (RAG)SonnetThe default; faster than Opus matters in chat
Generate marketing copySonnetQuality matters but Opus is overkill
Write a 5-file refactorSonnet first, Opus if it failsTry the cheap one first
Plan a multi-step agentOpusPlans are expensive to redo
Critique another LLM's outputOpusLLM-as-judge benefits from the strongest reasoner
Translate a sentenceHaikuSimple, fast
Decide if user input contains PIIHaikuYes/no classification

Cost math: routing across tiers

Concrete example. A customer-support AI handles 100,000 queries a day. Assume each call is about 1K input tokens and 500 output tokens. The per-call cost on each tier:

  • Haiku 4.5 — (1,000 × $1 + 500 × $5) / 1,000,000 = $0.0035 per call
  • Sonnet 4.6 — (1,000 × $3 + 500 × $15) / 1,000,000 = $0.0105 per call
  • Opus 4.7 — (1,000 × $5 + 500 × $25) / 1,000,000 = $0.0175 per call

Without routing, every query goes to Sonnet:

100,000 calls × $0.0105 = $1,050/day.

With routing — Haiku handles 60% (simple lookups, classification), Sonnet handles 35% (real questions), Opus handles 5% (escalations that need harder reasoning):

  • 60,000 × $0.0035 (Haiku) = $210.00
  • 35,000 × $0.0105 (Sonnet) = $367.50
  • 5,000 × $0.0175 (Opus) = $87.50

Total: $665/day. That's a $385/day saving (~37%) just from routing, with arguably better outcomes on the Opus-routed escalations.

The routing logic itself can be a Haiku classifier ("which tier should handle this prompt?"). A short classification call costs about $0.0004 — running it on all 100,000 calls adds roughly $40/day, dwarfed by the $385/day the routing saves.

What to do next

For the cost-optimisation companion technique that stacks with model routing:

For the evaluation infrastructure you need to actually pick a tier on data instead of vibes:

External reference: the Anthropic model documentation is the canonical source for current capabilities, context windows, and pricing.

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsClaudeAnthropicLLMModel SelectionHaikuSonnetOpus

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts

Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

How to Use Regex Lookaheads and Lookbehinds

Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.