Claude Models Explained: Opus, Sonnet, Haiku for Coding

Opus for architecture, Sonnet for execution, Haiku for the fast repetitive stuff. And prompt caching that makes the pricing math work in your favour.

Prakash Poudel Sharma

Engineering Manager · Product Owner · Varicon

Apr 26, 2026 · 6 min read

Claude Models Explained: Opus, Sonnet, Haiku for Coding

I spent a month defaulting every task to Opus 4.7 — the most capable model — and watched my API bill climb. Then I switched everything to Haiku — cheapest — and started catching subtle bugs slipping through. The fix wasn’t picking one model. It was routing tasks by complexity.

Here’s how to think about the three Claude tiers and the prompt caching behaviour that makes the API pricing genuinely manageable.

The three tiers

Claude Opus 4.7 — when you need the model to think

Opus earns its cost when the task requires actual reasoning, not just execution. When I say reasoning, I mean: holding multiple constraints simultaneously, identifying what you didn’t ask for but should have, designing something that will survive contact with a real codebase.

Use Opus for:

Architectural decisions (how should this service boundary work?)
Auditing a large codebase for patterns or anti-patterns
Writing a spec or design doc that needs genuine judgment, not just prose
Debugging a hard problem with no obvious cause — the kind where you need the model to generate hypotheses, not completions
Code review where the thing you want flagged is the subtle stuff, not the obvious stuff

Don’t use Opus for: implementing a feature from a clear spec, writing tests, formatting, refactoring with a defined target shape, anything repetitive. You’re paying for a reasoning engine; don’t route a typewriter task to it.

Claude Sonnet 4.6 — the daily driver

Sonnet is what Claude Code defaults to, and it’s right to. For the majority of agentic coding tasks — feature implementation, refactoring toward a known target, writing tests, reviewing a PR, summarising what a module does — Sonnet delivers output you’d be happy with at around 60-70% of Opus cost.

The gap between Sonnet and Opus is real but narrow for well-scoped execution tasks. It widens significantly on tasks that require judgment, ambiguity resolution, or multi-step reasoning with no clear correct path.

My heuristic: if I can write the task as a clear spec in one paragraph, Sonnet can execute it. If writing the spec is the hard part, I want Opus.

Claude Haiku 4.5 — fast and cheap for the repetitive

Haiku is quick and costs roughly 1/15th of Opus. That’s not a typo. For tasks where the expected output is simple and the tolerance for error is low, it’s an easy call.

Good Haiku tasks:

“Read this file and tell me the exported function signatures”
“Is this line violating our naming convention?”
“Generate boilerplate for a new Express route that follows this pattern”
“Summarise what this commit diff does in one sentence”
Any step in a pipeline where the action is mechanical and the input is constrained

Bad Haiku tasks: anything requiring judgment, anything where a subtle error compounds downstream, anything architectural.

Note: Haiku is API-only. There’s no subscription tier that includes it. If you want to route tasks to Haiku, you need an API key.

The task-routing pattern

The biggest unlock is not picking a model per session — it’s routing within a session by complexity.

An agentic session might look like:

Load context, understand the task → Haiku (mechanical reading)
Design the implementation approach → Opus (if novel) or Sonnet (if routine)
Implement the code → Sonnet
Write tests → Sonnet
Check if any test violates the naming convention → Haiku
Summarise what changed for the commit message → Haiku

Tools that support multi-model sessions (or where you’re scripting API calls directly) make this natural. For interactive Claude Code sessions, you’re typically on one model per session — but knowing the tiers means you can consciously pick the right one at session start.

Prompt caching — the underused cost lever

Prompt caching is Anthropic’s mechanism for billing repeated context at roughly 10% of the normal token cost. The cache is keyed on the prefix — if the start of your context is identical across requests, subsequent requests pay the cached rate.

Here’s what that means in practice.

Your CLAUDE.md and rules/*.md files are static. They appear at the start of every request in a session. After the first request hits the API and warms the cache, every subsequent request in that session pays the cache rate for those tokens.

Concrete math:

Suppose your context header (CLAUDE.md + rules files) is 10,000 tokens. You run 20 requests in a session.

Without caching: 20 × 10,000 = 200,000 billed input tokens
With caching: 10,000 (first request, full price) + 19 × 10,000 at cache rate (~$0.30/million instead of $3.00/million)
Billed equivalent: 10,000 + (19 × 10,000 × 0.10) = 10,000 + 19,000 = 29,000 effective token-equivalents
Reduction: ~85%

For a heavy session with a dense CLAUDE.md, the difference between a $2.00 session and a $0.30 session is whether you’re hitting the cache.

How to ensure cache hits

The cache key is the exact text of the prefix. Two things break it:

Editing your CLAUDE.md mid-session. If you change the file, the prefix changes, the cache misses, and you pay full price for the next request. Keep your context files stable within a session.
Dynamic content early in the context. If you inject a timestamp or a random value at the start of your system prompt, every request is a cache miss. Put dynamic content late in the prompt, after the stable prefix.

The practical rule: write your CLAUDE.md before you start a session. Let it be static for the duration. Edit it between sessions, not during.

API vs. subscription per tier

Model	Available on subscription?	API pricing (approx)
Opus 4.7	Claude Max (limited)	~$15/M input, ~$75/M output
Sonnet 4.6	Claude Pro + Max	~$3/M input, ~$15/M output
Haiku 4.5	No	~$0.80/M input, ~$4/M output

If you’re on Claude Pro and doing interactive sessions, you’re effectively using Sonnet at no marginal cost. If you want Opus or Haiku at meaningful scale, you need the API.

Practical recommendation

Start with Sonnet 4.6 for everything. It’s the right default — strong enough for almost all coding tasks, fast enough for interactive sessions, and it’s what the Claude Code team has tuned their tooling around.

Add Haiku explicitly when you’re building pipeline steps that are mechanical and high-volume. Add Opus when you hit a task where Sonnet’s output genuinely isn’t good enough — that’s your signal that the task requires more reasoning, not just more trying.

Don’t add Opus “just in case.” The gap is real for the tasks that need it; for everything else, you’re paying a premium for output you wouldn’t have been able to distinguish from Sonnet anyway.

Coming next

Post 3 looks at DeepSeek — a model family that matches Sonnet on many coding benchmarks at roughly 10x lower API cost. Including what it’s genuinely good at, where it falls short, and what you need to know about data residency before routing any serious work through it.

About the author

Prakash Poudel Sharma

Engineering Manager · Product Owner · Varicon

Engineering Manager at Varicon, leading the Onboarding squad as Product Owner. Eleven years of building software — first as a programmer, then as a founder, now sharpening the product craft from the inside of a focused team.

Follow on LinkedIn More essays

Choosing Your AI Stack

6 parts in this series.

A six-part series on picking models, pricing models, and building workflows that aren't locked to any single provider.

Previous Part 1: Token vs Subscription: Which AI Pricing Model Is Right for You Next Part 3: DeepSeek for Developers: Capable, Cheap, and Worth Knowing

01 Token vs Subscription: Which AI Pricing Model Is Right for You previous
02 Claude Models Explained: Opus, Sonnet, Haiku for Coding ← you are here
03 DeepSeek for Developers: Capable, Cheap, and Worth Knowing up next
04 GLM Models: The Open-Source Alternative from Zhipu AI
05 Local Models: When You Run the Weights Yourself
06 Building Model-Agnostic Workflows

Join the conversation 0 comments

What did you take away?

Thoughts, pushback, or a story of your own? Drop a reply below — I read every one.

Comments are powered by Disqus. By posting you agree to their terms.