A colleague told me a near-miss horror story.
As Google began deprecating Gemini 2.0, we moved to Gemini 2.5 Pro. But reasoning is enabled by default and cannot be turned off.
For our specific problem statement, reasoning was not required. Token costs increased 10x and speeds were 3-4x slower. We moved the client to Gemini 2.5 Flash Lite, which has reasoning turned off by default and offers much lower latency.
Because we track compute costs closely, we managed this without a major financial impact.
But model updates clearly require careful testing on the cost and latency front as well, not just output quality.
AI used to keep getting cheaper. But now it's more a "convergence".
Each line traces a model family. The X-axis is its intelligence (LMArena ELO score) and the Y-axis is the input cost ($ per million tokens, log scale). Time flows roughly left to right as models improve.
Three patterns emerge and each seems strategic.
Gemini 1.5 Flash was $0.075/MTok in August 2024. I told everyone to use it - it's a fantastic, affodable model.
Gemini 3.5 Flash, released this week at Google I/O, is $1.50. That's a 20× in 21 months. Looks like "Flash" is more a brand position than a price position. It migrates up-market.
Same for Gemini 1.5 Flash 8b (3.75 cents) which migrated into Gemini 3.1 Flash Lite (25 cents - a 6.7× increase).
Gemini Pro went the opposite direction: down from $5.00 (1.5-pro in Oct 2024) to $1.25-$2.00 today. Pro seems to be a competitive weapon against Anthropic and OpenAI at the top, while monetizing the middle.
Of course, Gemini's real lock-in is the Google Workspace bundling and Search AI Mode. Personally, I subscribed to Google Pro for the first time in 20 years just for these bundled capabilities.
By the time people notice the Flash price, it's hard to leave the ecosystem.
The price chart speaks for itself:
| Model | Price |
|---|---|
| GPT-4 | $30 |
| GPT-4 Turbo | $10 |
| GPT-4o | $2.50 |
| GPT-4.1 | $2.00 |
| GPT-5 | $1.25 |
| GPT-5.2 | $1.75 |
| GPT-5.4 | $2.50 |
| GPT-5.5 | $5.00 |
Maybe their strategy seems is: scale current technology to commodity (the O-series showed the same pattern with $15 for O1 falling to $1.10 for O4-mini in 7 months), THEN launch a new frontier above it, and repeat?
Claude Sonnet has held at exactly $3.00/MTok input for over two years, across four model generations and an ELO gain of nearly 200 points. Pretty unusual.
Opus came down from $15 to $5 in November 2025 - likely a deliberate move to make it production-viable. Haiku crept from $0.25 to $0.80. The tiers are converging.
There's a twist, though. Anthropic restructured enterprise contracts in late 2025. Seat prices dropped to $20/seat. Token discounts (previously 10-15% off API rates) were removed. For a 100-seat team, that adds ~$15K-$40K to annual TCO. So, prices went down, but the actual bill went up.
Also, Opus 4.7 uses a new tokenizer that may consume ~35% more tokens for the same text. It's worth re-benchmarking prompts before assuming $5 is 1/3rd of $15.
Model family prices change rapidy. Old models get deprecated. Best to be prepared.
For enterprise procurement: keep a close eye on pricing changes, API token discounts, and what was quietly removed during renewal. (AI helps with this!)
The first 18 months of most AI model families are discounted customer acquisition. Then value extraction follows. Google started it with Flash. OpenAI is doing it with GPT-5.5. Anthropic is doing it with enterprise billing restructuring.
Fair enough. Providers need to recover infra investments.
And build good routing strategies helps enterprises get the most out of this.
Just keep asking yourself: "what's our plan for when this model changes or deprecates?"

PS: AI-generated image - has a few errors.