Pricing & incentives

The token-maxing canyon.

Subscription buffets trained a generation of token-maxers. The buffet is closing — Claude Code is out of the $20 plan, weekly caps are tightening, and rolling windows are shrinking. Here's the math behind why it had to, and the incentive flip we should actually want.

Cam Fortin · April 2026

That hum of productivity-maxxing is the ambient noise of modern professional life. With AI, we've turned up the volume to a level the human mind has never encountered. The gap between what these systems can output in an hour and what any person can output in an hour isn't a gap anymore — it's a canyon. — Lindsey Witmer Collins, on LinkedIn

Lindsey wrote that last week and it stuck with me. The canyon is real. I felt it the first time I left a Claude Code /loop running overnight and woke up to 6 hours of clean commits — work that would have eaten me for a week. The next thought, immediately, was: why isn't another loop running on something else?

That second thought is the token-maxer talking. I have a devil on my shoulder whispering "you are leaving value on the table — have agents do more, more, more." It is not a virtuous voice. It is the voice of someone whose buffet is about to close and who hasn't accepted it yet.

This post is about why the buffet is closing, what the math actually looks like, and why I think the incentive shift coming in 2026 is the thing that quiets that voice — for me, and for everyone else who has been quietly speed-running the canyon.

Figure 1 · The canyon
Roughly normalized "useful output per hour," human-baseline = 1. Scale is illustrative — not benchmarks.
Solo human · 2019
Human + Copilot · 2023
Human + Claude · 2024
Solo agent · 2025
30×
Multi-agent fleet · 2026
200×+
2019 2026 human ceiling → canyon →
The y-axis isn't lines of code — it's "things finished without a human in the loop." A solo agent that runs /loop overnight on a well-scoped task can close 20–50 tickets while you sleep. A fleet of them, scoped right, multiplies that. The canyon isn't speculative anymore.

The buffet years (2024–2025)

For about eighteen months, frontier AI was sold to power users in a way that almost nothing else in software is sold: flat fee, near-unlimited, the heavier you used it the better the deal got.

The Claude Pro plan at $20/mo bundled in Claude Code, Sonnet, and a generous chat allowance. The Max tiers ($100 and $200/mo) felt unlimited to anyone who wasn't running a server farm — and the people who were running a server farm got the best deal of all, because the marginal token cost to them was zero.

Inside that pricing structure, "token-maxing" wasn't dumb. It was rational. Every token you didn't burn was free money you left in Anthropic's pocket. The optimal strategy on a buffet is to eat — and the optimal strategy on a buffet that costs $200/mo and serves $5,000-of-API value is to eat without thinking.

This produced a generation of habits I now recognize in myself:

None of these were crazy at $20/mo. All of them are insane at API rates. The buffet trained the behavior. Now the buffet is leaving.

What's actually changing in 2026

The signal isn't subtle. Over the last two quarters Anthropic has tightened subscription pricing in five visible ways at once:

Figure 2 · The buffet closes
A reconstructed timeline of subscription tightening. Dates are when changes hit power users, not when they were announced.
2024 · Q3
Claude Code launches as an API-metered tool
Power users billed per token. Heavy days run $50–$300. Pricing matched cost, behavior matched price.
2025 · Q1
Claude Code bundled into Pro ($20) and Max ($100/$200)
Subscriptions feel unlimited. Token-maxer culture takes off — leaderboards, "left it running for 8 hours," screenshots of $40K equivalent API spend.
2025 · Q3
Weekly caps introduced on Max
First explicit ceiling. Heaviest users start hitting walls mid-week. Pricing language softens from "all you can use" to "fair use."
2025 · Q4
5-hour rolling window enforcement tightens
Burst usage is what the model can support; the rolling window is what the wallet can. The window goes from generous to genuinely binding for power users.
2026 · Q1
Claude Code removed from the $20 Pro plan
The headline shift. The cheapest tier no longer subsidizes agent-coding workloads at all. If you want Claude Code, you're on Pro+/Max — or back on the API meter.
2026 · ongoing
Convergence with API economics
Each quarter, the gap between "what a heavy subscriber gets" and "what the same usage would cost on the API" narrows. The $200 Max plan increasingly resembles a pre-paid API allowance with branding.
None of these moves are punitive — they're unit economics catching up to a product that, briefly, was priced more like a category-acquisition flywheel than a sustainable SaaS line. Inference still costs real money. Subscriptions can't outrun that forever.

The arbitrage, in one table

The reason the buffet has to close is that the gap between what a heavy user pays and what their tokens cost is genuinely large. Order-of-magnitude large. This is roughly what subscription arbitrage has looked like for a working developer at three usage levels:

Figure 3 · The arbitrage
Approximate API-equivalent cost of a working month at three usage profiles, vs the subscription price the same user paid. Token costs assume a Sonnet-weighted Claude Code workload (~$3 / $15 per MTok in/out).
User profile Daily tokens API equivalent / mo Sub paid Subsidy
Casual
few prompts/day
~500K $45 $20
Working dev
Claude Code daily
~10M $900 $100
Token-maxer
loops, fleets, parallels
~80M $7,200 $200 36×
Whale
overnight fleets, screenshots on X
~300M $27,000 $200 135×
$30K $10K $1K $0 $200 sub ceiling casual working dev token-maxer whale API-equivalent monthly spend ↑
The shaded region is the subsidy. Anthropic eats it for the casual user (small enough to be fine). They eat it grudgingly for the working dev (still strategic). For the maxer and the whale, the subsidy is the entire economics of the customer — and there is no version of running a foundation lab where you eat that indefinitely.

The incentive flip

When the unit you pay for is "a month," your dominant strategy is to use as many tokens as possible. When the unit you pay for is "a token," your dominant strategy is to use as few tokens as possible per unit of finished work.

Those two dominant strategies look nothing alike. They produce different commit graphs, different prompts, different tools, different team norms. The shift from one to the other is what's actually happening over the next 18 months — and almost nobody is talking about it as a behavioral story.

Figure 4 · The incentive flip
Same user, same job, same model — different pricing, different rational behavior.

Buffet pricing · 2024–25

"Tokens are free, my time isn't."
  • Spawn first, think later. Three subagents because parallel feels productive.
  • No compaction discipline. Keep every transcript long; context loss feels worse than waste.
  • Loop overnight. The cost of an idle loop is zero; the cost of waking up to nothing finished is real.
  • Big-model default. Always reach for the most capable model, regardless of task.
  • Re-run instead of cache. Cheaper to re-ask than to plumb cache control.
  • Tokens-per-day as bragging right. "I burned 200M tokens this week" = status.

API / metered pricing · 2026+

"Every token has a price tag I can see."
  • Scope before spawn. One agent on a clear task beats three agents on a vague one — and costs a third.
  • Compact aggressively. The transcript is a bill. Smaller bill = same answer.
  • Loops cost money. Loops have to justify themselves. Most idle loops die.
  • Right-size the model. Haiku for routing, Sonnet for build, Opus for hard reasoning.
  • Prompt caching is religion. 90% off the input price for hits — it adds up fast.
  • Output-per-dollar as bragging right. "I shipped this PR for 4¢" = status.

Why this is actually good

I want to be clear: I don't want subscriptions to die. The $20 plan is one of the best deals in the history of software, and most users will never come close to its real cost. They shouldn't.

What I want is for the people running fleets — me included — to feel each token they spend. Not because spending tokens is bad, but because not feeling them bends behavior toward waste in a way that doesn't actually produce more useful output. Most of my "MORE MORE MORE" instinct is not value-creating. It's anxiety, dressed in agentic clothing, racing the canyon.

Lindsey's frame is the right one. The canyon between human and machine output is real, and the answer to it is not to keep redlining the engine to feel like we're keeping up. The answer is to aim better: pick the work that matters, scope it tightly, let the agent close it, stop. The pricing change is the thing that finally rewards that.

The thesis

Token-maxing is a phase, and pricing is the cure.

The buffet trained the behavior. The metered era will untrain it. The same person who once spawned a fleet to make a one-line change will, six months from now, ask one well-scoped question and be done. Not because they got more disciplined — because the price tag in front of them changed.

The habits I'm trying to build now

I don't want to wait for the subscription squeeze to force me into better habits. So I'm trying to act as if I'm already on the API meter, even on weeks when I'm not. The list, in case it's useful to anyone else:

  1. Scope before spawn. Write a one-paragraph task brief before any agent runs. If I can't write the brief, the agent shouldn't be running.
  2. Cache aggressively. Every system prompt and large doc gets prompt-caching headers. The 90% discount on cache hits is the single largest token-economics lever for power users.
  3. Right-size the model. Haiku for routing and small classifiers, Sonnet for the build, Opus only when reasoning is the bottleneck.
  4. Compact, don't sprawl. If a transcript needs context from earlier, summarize it forward. Don't let the bill drag the whole conversation.
  5. Kill idle loops. If a /loop can't articulate what it's doing on this iteration, it shouldn't run on this iteration.
  6. Measure cost per closed PR, not tokens per day. The first metric pulls you toward usefulness. The second pulls you toward the canyon edge.

Some of these are already saving me real money. All of them are saving me from the worse version of myself — the one with the devil on his shoulder, mistaking motion for output.

Where this lands for OnlyData

OnlyData runs against this same shift. We use Claude Code on the AR pipeline, the enrichment workers, the Stu fleet, the dispatch queue. Our cost-per-canonical-company will, over the next year, become an actual line item rather than an asterisk inside a $200/mo subscription. That's fine. It's the right shape. We'd rather optimize against a real number than free-ride a buffet that was never going to last.

The data work itself benefits from the shift. Smaller, sharper prompts. Cheaper local models for the deterministic stuff. LLM calls only where reasoning genuinely earns its keep. The same discipline that saves money also produces cleaner data.

Lindsey's canyon is the headline. The pricing flip is the lever. The habits are the work. I think most of us — me, the maxer culture, the labs themselves — are about to chill out. I hope.

More on building in this era

How OnlyData uses small, cheap, local models for the work that doesn't need a frontier API call — and where we still pay for Claude.

Small models, big questions → All posts