Gemini 2.0 Pro and Gemini 2.0 Flash — tiered context pricing, context caching, and Vertex AI vs. direct API differences.
Updated May 2026Google charges differently based on prompt length — crossing the 128K token threshold doubles your input cost. This creates non-linear billing that's harder to predict than flat per-token rates.
| Model | Input ≤128K / 1M | Input >128K / 1M | Output ≤128K / 1M | Output >128K / 1M |
|---|---|---|---|---|
| Gemini 2.0 Pro | $1.25 | $2.50 | $5.00 | $10.00 |
| Gemini 2.0 Flash | $0.075 | $0.15 | $0.30 | $0.60 |
| Gemini 2.0 Flash-Lite | $0.0375 | $0.075 | $0.15 | $0.30 |
Gemini supports context caching to reduce costs on repeated large contexts (documentation, codebase, long system prompts). Cached content is billed at a discounted rate.
| Model | Cache storage / 1M tokens / hour | Cache input / 1M tokens |
|---|---|---|
| Gemini 2.0 Pro | $4.50 | $0.3125 |
| Gemini 2.0 Flash | $1.00 | $0.01875 |
Google offers two ways to access Gemini models with meaningfully different pricing and features:
| Dimension | Gemini Developer API | Vertex AI |
|---|---|---|
| Target | Startups, prototypes | Enterprise, regulated |
| Free tier | Yes (rate-limited) | No |
| Pricing | Direct Google billing | GCP billing (slightly different rates) |
| Data residency | Limited | Full regional control |
| SLA | Best-effort | Enterprise SLA |
Gemini's 1M+ token context window is the industry leader for processing entire books, large codebases, or extensive conversation histories in a single call.
Native vision, audio, and video understanding is built into Gemini. No separate model switching for image analysis tasks.
Gemini 2.0 Flash at $0.075/M input is among the cheapest capable models available. For high-volume, shorter-context tasks it's hard to beat.
PayMesh connects to your Google Cloud billing to track Gemini API costs. See which models and context lengths are driving your bill.