Claude 3.5 Sonnet, Claude 3.5 Haiku, and Claude 3 Opus — with prompt caching and batch pricing.
Updated May 2026Anthropic uses a 5× input-to-output multiplier on flagship models — higher than OpenAI's 4×. Output-heavy workloads (long summaries, code generation) cost significantly more than prompt-heavy ones.
| Model | Input / 1M tokens | Output / 1M tokens | Context window | Tier |
|---|---|---|---|---|
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K | Production |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200K | Budget |
| Claude 3 Opus | $15.00 | $75.00 | 200K | Premium |
Claude 3.5 models support prompt caching — the largest cost-reduction feature on the platform. Repeated system prompts, documents, or context are cached and billed at a fraction of standard input rates.
| Model | Cache write / 1M | Cache read / 1M | Savings vs. standard input |
|---|---|---|---|
| Claude 3.5 Sonnet | $3.75 | $0.30 | 90% on reads |
| Claude 3.5 Haiku | $1.00 | $0.08 | 90% on reads |
Anthropic's Message Batches API offers 50% discount for async workloads processed within 24 hours. Same models, same quality, half the price.
| Model | Batch input / 1M | Batch output / 1M |
|---|---|---|
| Claude 3.5 Sonnet | $1.50 | $7.50 |
| Claude 3.5 Haiku | $0.40 | $2.00 |
200K context window plus prompt caching means Anthropic is the strongest choice when your prompts carry large documents that repeat across requests.
Claude 3.5 Sonnet consistently ranks at the top of coding benchmarks. It's the model most engineering teams reach for first.
Claude's constitutional AI training produces reliable compliance with complex formatting and structural instructions — fewer retries, cleaner outputs.
Most Anthropic plans use a prepaid credit model. You buy credits in advance; when they run out, API calls fail with HTTP 429 errors. This is not a gradual overage — it's a hard stop. Teams that underestimate credit needs hit production outages, not surprise invoices. Plan credit buffers accordingly.
PayMesh syncs Claude API usage hourly. Get alerted before credits run out — not after your production app starts throwing 429s.