The token-maxing canyon.
Subscription buffets trained a generation of token-maxers. The buffet is closing — Claude Code is out of the $20 plan, weekly caps are tightening, and rolling windows are shrinking. Here's the math behind why it had to, and the incentive flip we should actually want.
That hum of productivity-maxxing is the ambient noise of modern professional life. With AI, we've turned up the volume to a level the human mind has never encountered. The gap between what these systems can output in an hour and what any person can output in an hour isn't a gap anymore — it's a canyon. — Lindsey Witmer Collins, on LinkedIn
Lindsey wrote that last week and it stuck with me. The canyon is real. I felt it the first time I left a Claude Code /loop running overnight and woke up to 6 hours of clean commits — work that would have eaten me for a week. The next thought, immediately, was: why isn't another loop running on something else?
That second thought is the token-maxer talking. I have a devil on my shoulder whispering "you are leaving value on the table — have agents do more, more, more." It is not a virtuous voice. It is the voice of someone whose buffet is about to close and who hasn't accepted it yet.
This post is about why the buffet is closing, what the math actually looks like, and why I think the incentive shift coming in 2026 is the thing that quiets that voice — for me, and for everyone else who has been quietly speed-running the canyon.
/loop overnight on a well-scoped task can close 20–50 tickets while you sleep. A fleet of them, scoped right, multiplies that. The canyon isn't speculative anymore.The buffet years (2024–2025)
For about eighteen months, frontier AI was sold to power users in a way that almost nothing else in software is sold: flat fee, near-unlimited, the heavier you used it the better the deal got.
The Claude Pro plan at $20/mo bundled in Claude Code, Sonnet, and a generous chat allowance. The Max tiers ($100 and $200/mo) felt unlimited to anyone who wasn't running a server farm — and the people who were running a server farm got the best deal of all, because the marginal token cost to them was zero.
Inside that pricing structure, "token-maxing" wasn't dumb. It was rational. Every token you didn't burn was free money you left in Anthropic's pocket. The optimal strategy on a buffet is to eat — and the optimal strategy on a buffet that costs $200/mo and serves $5,000-of-API value is to eat without thinking.
This produced a generation of habits I now recognize in myself:
- Running an agent on a task you'd ordinarily just do yourself in 2 minutes.
- Asking for full rewrites instead of diffs because "why not."
- Spawning three subagents to do work one would handle, because parallel feels productive.
- Leaving
/loopon overnight on something you don't actually need finished by morning. - Keeping a 200K-token transcript open for three days because compaction "might lose context."
None of these were crazy at $20/mo. All of them are insane at API rates. The buffet trained the behavior. Now the buffet is leaving.
What's actually changing in 2026
The signal isn't subtle. Over the last two quarters Anthropic has tightened subscription pricing in five visible ways at once:
The arbitrage, in one table
The reason the buffet has to close is that the gap between what a heavy user pays and what their tokens cost is genuinely large. Order-of-magnitude large. This is roughly what subscription arbitrage has looked like for a working developer at three usage levels:
| User profile | Daily tokens | API equivalent / mo | Sub paid | Subsidy |
|---|---|---|---|---|
| Casual few prompts/day |
~500K | $45 | $20 | 2× |
| Working dev Claude Code daily |
~10M | $900 | $100 | 9× |
| Token-maxer loops, fleets, parallels |
~80M | $7,200 | $200 | 36× |
| Whale overnight fleets, screenshots on X |
~300M | $27,000 | $200 | 135× |
The incentive flip
When the unit you pay for is "a month," your dominant strategy is to use as many tokens as possible. When the unit you pay for is "a token," your dominant strategy is to use as few tokens as possible per unit of finished work.
Those two dominant strategies look nothing alike. They produce different commit graphs, different prompts, different tools, different team norms. The shift from one to the other is what's actually happening over the next 18 months — and almost nobody is talking about it as a behavioral story.
Buffet pricing · 2024–25
- Spawn first, think later. Three subagents because parallel feels productive.
- No compaction discipline. Keep every transcript long; context loss feels worse than waste.
- Loop overnight. The cost of an idle loop is zero; the cost of waking up to nothing finished is real.
- Big-model default. Always reach for the most capable model, regardless of task.
- Re-run instead of cache. Cheaper to re-ask than to plumb cache control.
- Tokens-per-day as bragging right. "I burned 200M tokens this week" = status.
API / metered pricing · 2026+
- Scope before spawn. One agent on a clear task beats three agents on a vague one — and costs a third.
- Compact aggressively. The transcript is a bill. Smaller bill = same answer.
- Loops cost money. Loops have to justify themselves. Most idle loops die.
- Right-size the model. Haiku for routing, Sonnet for build, Opus for hard reasoning.
- Prompt caching is religion. 90% off the input price for hits — it adds up fast.
- Output-per-dollar as bragging right. "I shipped this PR for 4¢" = status.
Why this is actually good
I want to be clear: I don't want subscriptions to die. The $20 plan is one of the best deals in the history of software, and most users will never come close to its real cost. They shouldn't.
What I want is for the people running fleets — me included — to feel each token they spend. Not because spending tokens is bad, but because not feeling them bends behavior toward waste in a way that doesn't actually produce more useful output. Most of my "MORE MORE MORE" instinct is not value-creating. It's anxiety, dressed in agentic clothing, racing the canyon.
Lindsey's frame is the right one. The canyon between human and machine output is real, and the answer to it is not to keep redlining the engine to feel like we're keeping up. The answer is to aim better: pick the work that matters, scope it tightly, let the agent close it, stop. The pricing change is the thing that finally rewards that.
Token-maxing is a phase, and pricing is the cure.
The buffet trained the behavior. The metered era will untrain it. The same person who once spawned a fleet to make a one-line change will, six months from now, ask one well-scoped question and be done. Not because they got more disciplined — because the price tag in front of them changed.
The habits I'm trying to build now
I don't want to wait for the subscription squeeze to force me into better habits. So I'm trying to act as if I'm already on the API meter, even on weeks when I'm not. The list, in case it's useful to anyone else:
- Scope before spawn. Write a one-paragraph task brief before any agent runs. If I can't write the brief, the agent shouldn't be running.
- Cache aggressively. Every system prompt and large doc gets prompt-caching headers. The 90% discount on cache hits is the single largest token-economics lever for power users.
- Right-size the model. Haiku for routing and small classifiers, Sonnet for the build, Opus only when reasoning is the bottleneck.
- Compact, don't sprawl. If a transcript needs context from earlier, summarize it forward. Don't let the bill drag the whole conversation.
- Kill idle loops. If a
/loopcan't articulate what it's doing on this iteration, it shouldn't run on this iteration. - Measure cost per closed PR, not tokens per day. The first metric pulls you toward usefulness. The second pulls you toward the canyon edge.
Some of these are already saving me real money. All of them are saving me from the worse version of myself — the one with the devil on his shoulder, mistaking motion for output.
Where this lands for OnlyData
OnlyData runs against this same shift. We use Claude Code on the AR pipeline, the enrichment workers, the Stu fleet, the dispatch queue. Our cost-per-canonical-company will, over the next year, become an actual line item rather than an asterisk inside a $200/mo subscription. That's fine. It's the right shape. We'd rather optimize against a real number than free-ride a buffet that was never going to last.
The data work itself benefits from the shift. Smaller, sharper prompts. Cheaper local models for the deterministic stuff. LLM calls only where reasoning genuinely earns its keep. The same discipline that saves money also produces cleaner data.
Lindsey's canyon is the headline. The pricing flip is the lever. The habits are the work. I think most of us — me, the maxer culture, the labs themselves — are about to chill out. I hope.
More on building in this era
How OnlyData uses small, cheap, local models for the work that doesn't need a frontier API call — and where we still pay for Claude.
Small models, big questions → All posts