The Floor Beneath the Token Price
Why the cheapest token isn’t free, and what that means for every startup betting on AI.
There’s a moment that happens in almost every AI startup pitch I’ve seen over the past two years. The founder clicks to a slide showing pricing, notes with visible satisfaction that AI costs have dropped dramatically, and implies, without quite saying it, that the trend will continue indefinitely. Free is coming. Margins will materialize. Just give us time.
They’re not wrong about the direction. Token prices have fallen faster than almost anyone predicted, and the trajectory is genuinely impressive. But there’s a floor no one seems to want to talk about, and it’s made of physics.
Every token a language model generates requires computation. Computation requires electricity. Electricity costs money. And while hardware gets more efficient over time, it does not get infinitely efficient. At some point, the cost of generating a token is not set by market competition or model architecture. It’s set by the laws of thermodynamics.
That floor matters more than it might seem, especially if you’re building on top of AI infrastructure, investing in companies that do, or trying to understand whether any of this actually makes money at scale.
What a token actually costs to produce
Forget the per-token pricing you see on API documentation pages. That’s the retail price, with margin baked in, designed for developers at low volume. The more interesting number is what it actually costs to run the compute underneath.


