TechEarl

How to Choose Between Claude Haiku, Sonnet, and Opus

Pick the right Claude tier for the job: Haiku for high-volume cheap, Sonnet for the default workhorse, Opus for hard reasoning. With cost math, latency, and a decision matrix.

Ishan KarunaratneIshan Karunaratne⏱️ 11 min readUpdated
Pick the right Claude tier: Haiku for cheap volume, Sonnet for the default, Opus for hard reasoning. Cost math, latency, capability comparison, decision matrix.

The default pick for production AI work in 2026 is Claude Sonnet 4.6 — it handles 90% of real tasks well, costs about a fifth of Opus, and runs about three times faster on the same prompt. Haiku 4.5 is the right pick when you have high volume and forgiving accuracy requirements (classification, extraction, summarisation of short text). Opus 4.7 is the right pick when you need the best possible reasoning — multi-step planning, complex code refactors, anything where Sonnet has been visibly struggling. I'll walk the cost math, the latency profile, the capability gaps, and a concrete decision matrix for routing prompts across the three tiers.

The wrong question is "which model is the best." The right question is "what's the cheapest tier that gets this specific task done well enough." A production AI app that runs everything on Opus burns 5× the budget for a quality lift you probably can't measure. An app that runs everything on Haiku makes mistakes the user notices. The win is routing.

Jump to:

The current pricing (mid-2026)

ModelInput per millionOutput per millionCache hit (10%)Notes
Claude Haiku 4.5$0.80$4.00$0.08Fastest, cheapest, decent reasoning
Claude Sonnet 4.6$3.00$15.00$0.30The workhorse — default pick
Claude Opus 4.7$15.00$75.00$1.50Best reasoning, slowest, most expensive

Cache writes cost 1.25× input. Prompt caching is universally supported — covered in How to Cut LLM API Costs with Prompt Caching.

The output cost is where the budget goes. A typical workflow has a 5:1 to 10:1 output-to-input token ratio for content-generation tasks, lower for classification, higher for code generation. Opus output is 5× the cost of Sonnet output. That's why blanket Opus usage is rarely the right call.

Latency profile per tier

Time-to-first-token (TTFT) and tokens-per-second (TPS) at typical request sizes:

ModelTTFTTPS (output)Typical 500-token response
Haiku 4.5~250ms~140~3.8 sec
Sonnet 4.6~400ms~75~7 sec
Opus 4.7~700ms~45~12 sec

For interactive UX where the user is watching tokens stream, Haiku feels instant, Sonnet feels responsive, Opus feels thoughtful (read: slow). For background workflows where latency isn't user-visible, this doesn't matter. For real-time chat, it matters a lot.

Capability gaps: where each tier wins

After a year of running every kind of prompt across all three tiers in production, the practical capability map:

Haiku 4.5 is reliably good at:

  • Classification (sentiment, topic, intent)
  • Field extraction from structured documents
  • Short summaries (paragraph → 1-2 sentences)
  • Yes/no questions with a reasonably worded context
  • Simple translation
  • Code completion within a single function

Haiku 4.5 visibly struggles with:

  • Multi-step reasoning (more than 2-3 steps)
  • Long-context synthesis (>10 documents)
  • Hard refactors across many files
  • Anything that needs "thinking carefully"

Sonnet 4.6 is the workhorse:

  • All of Haiku's strengths, plus
  • Multi-step reasoning up to ~7 steps
  • Long-context analysis up to ~200K tokens reliably
  • Code generation that runs on the first or second try
  • Most agentic workflows
  • Most chatbot use cases

Sonnet 4.6 visibly struggles with:

  • Deeply nested logical reasoning ("if A then B, but only if not C, unless D, in which case E")
  • Novel mathematical proofs
  • Very long-horizon planning (15+ step plans)
  • Code refactors that touch architectural concerns

Opus 4.7 wins on:

  • The hard reasoning tasks above
  • Deep code refactors with cross-cutting concerns
  • Plans that need to be right because each step is expensive
  • Novel problem-solving where there is no obvious template

The default: start with Sonnet

For any new endpoint, prompt, or pipeline: start with Sonnet 4.6. Measure quality on a representative sample. Then ask:

  1. Is the quality acceptable? → keep Sonnet.
  2. Is the quality high-but-overkill (latency or cost matter more)? → try Haiku.
  3. Is the quality not good enough? → try Opus.

You'll find that for ~70% of endpoints Sonnet is the right answer. You'll move 20% down to Haiku for cost reasons, and 10% up to Opus for quality reasons. That distribution is the win.

Don't pick the tier based on intuition — measure. Run the same 50 prompts through each tier, score the outputs against a ground-truth or LLM-as-judge, and pick on data. Building an eval suite for this is covered in How to Write LLM Evals That Catch Regressions.

When to drop to Haiku

Drop to Haiku when:

  • The task is structural classification. "Is this a refund request, a billing question, or a feature request?" — Haiku handles this perfectly at 1/4 the cost.
  • You're extracting fields from a structured source. "Pull the order ID and the total from this email" — Haiku reads this fine.
  • You have high volume. Anything running thousands of times an hour is worth a Haiku eval just to see if it survives the downgrade. A 75% cost cut at high volume is real money.
  • Latency is user-facing. Streaming UX where the user is watching tokens feels noticeably better on Haiku.

Don't drop to Haiku when:

  • The task involves multi-step reasoning.
  • The task involves long context (>50K tokens with detail to track).
  • The task is novel — Haiku is great at tasks it has seen, weaker at unusual ones.

When to step up to Opus

Step up to Opus when:

  • Sonnet is producing wrong answers on the same prompt repeatedly. Not "slightly off" — visibly wrong. Run the same prompt 5 times. If 3+ outputs are wrong, Sonnet is at its limit.
  • The cost of a wrong answer is high. Financial decisions, medical summaries, legal drafting — anywhere a mistake is expensive to fix.
  • The task requires architectural thinking. Multi-file code refactors where the model needs to understand cross-cutting concerns.
  • The plan is long and each step is expensive. Agentic workflows where running the wrong tool costs you minutes or dollars per step. The cost of Opus is small compared to the cost of running 12 bad tool calls.

Don't step up to Opus when:

  • The task is high-volume and quality is "good enough" on Sonnet. The Opus cost adds up fast.
  • The task is latency-sensitive. Opus is noticeably slower.

Decision matrix

Task patternRecommendedWhy
Classify support ticketsHaikuStructural, high volume, forgiving
Extract fields from invoicesHaikuStructured, repetitive
Summarise a long PDFSonnetLong context, reasoning-light
Chat with a knowledge base (RAG)SonnetThe default; faster than Opus matters in chat
Generate marketing copySonnetQuality matters but Opus is overkill
Write a 5-file refactorSonnet first, Opus if it failsTry the cheap one first
Plan a multi-step agentOpusPlans are expensive to redo
Critique another LLM's outputOpusLLM-as-judge benefits from the strongest reasoner
Translate a sentenceHaikuSimple, fast
Decide if user input contains PIIHaikuYes/no classification

Cost math: routing across tiers

Concrete example. A customer-support AI handles 100K queries a day. Without routing, all queries go to Sonnet:

100,000 calls × $0.04 per call (1K input + 500 output) = $4,000/day.

With routing — Haiku handles 60% (simple lookups, classification), Sonnet handles 35% (real questions), Opus handles 5% (escalations needing deep context):

  • 60,000 × $0.012 (Haiku) = $720
  • 35,000 × $0.04 (Sonnet) = $1,400
  • 5,000 × $0.18 (Opus) = $900

Total: $3,020/day. That's a $980/day saving (~25%) just from routing, with arguably better outcomes on the Opus-routed escalations.

The routing logic itself can be a Haiku classifier ("which tier should handle this prompt?") — adds about $0.0008 per call to compute the route, dramatically outweighed by the routing savings on the rest of the call.

What to do next

For the cost-optimisation companion technique that stacks with model routing:

For the evaluation infrastructure you need to actually pick a tier on data instead of vibes:

External reference: the Anthropic model documentation is the canonical source for current capabilities, context windows, and pricing.

FAQ

TagsClaudeAnthropicLLMModel SelectionHaikuSonnetOpus
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

How to Use Regex Lookaheads and Lookbehinds

Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

Get reliable JSON from an LLM with structured-output modes on Anthropic, OpenAI, Gemini. Plus Zod/Pydantic validation, retry strategies, and common pitfalls.

How to Get Reliable JSON from an LLM

Get reliable JSON out of an LLM with native structured-output modes (Anthropic tool use, OpenAI Structured Outputs, Gemini schema), plus Zod / Pydantic validation as a fallback.

Remove empty, null, false, or empty-string values from a PHP array. Covers array_filter, the '0 gets removed' gotcha, array_values re-indexing, multidimensional cleanup, and a performance comparison.

How to Remove Empty Values from an Array in PHP

Drop empty, null, or false values from a PHP array with array_filter and the right callback. Includes the '0 gets removed' gotcha, the array_values re-index pattern, multidimensional cleanup, and a performance comparison.