Attrove
Posts
Software Has a Marginal Cost Again

Software Has a Marginal Cost Again

Tokens put usage back on the P&L. Most teams still price like it’s 2016.

Tim Monzures
May 07, 2026

Monday night at 9:04 PM, five minutes after I sent over a partnership proposal, a fellow founder replied with one paragraph:

❝

I like the concept. Admittedly, we don't have the money to pay for this, but I could see us offering it to clients as a premium option. How would you think about pricing for clients that wanted that?

That question used to be easy: pick a seat price, add a free trial, offer an annual discount. Optimize later.

I opened a notepad instead.

For the next 47 minutes, I worked through three different pricing shapes. Subscription. Usage. Outcome-based packaging. None of them looked anything like what I'd have proposed three years ago.

The 30-Year Free Lunch

Software became one of the best business models in history because the first copy is expensive, but the millionth copy is nearly free. Build the product once. Sell it forever.

The next user, next login, next page view, next database query, next seat, next workspace, next trial account. All of it had some cost, but at scale it was low enough to round down. That rounding error became an industry.

It’s why free trials worked. It’s why freemium worked. It’s why discounted upgrades worked. It’s why “ship now, figure out monetization later” became a real playbook instead of a bad joke.

Wharton drilled one equation into my head:

Marginal Revenue = Marginal Cost

That’s where the profit curve peaks.

For most internet-era software, marginal cost sat so close to zero that founders could mostly forget the equation after finals.

AI brings it back onto the notepad.

AI Puts Cost Back in the Product

Serving a web page, running a database query, or handling a lightweight serverless request can cost fractions of a cent at scale. A real AI workflow is different.

Long context. Retrieval. Tool calls. Multiple model passes. Retries. Evals. Frontier models when the cheap model fails. Output tokens. Latency budgets. Caching decisions. Guardrails. A single cheap model call may be tiny. A real production workflow can cost pennies or more.

Sounds small until the product starts working. Every active user now carries a variable cost: actual inference cost, every time the product does the intelligent thing users came for. The more your users use the product, the more they cost to serve. That is not a rounding error anymore.

What Breaks

Unlimited free usage gets harder. Freemium gets sharper. The old playbook of “give everything away until growth looks pretty” gets a lot less forgiving.

Free survives when it is capped, subsidized, or converting.

Consumer AI gets especially tricky. Businesses have budgets, owners, and outcomes. Consumers feel every paywall and hate metered anxiety. Knowing each click costs money, they use the product less - which is exactly what the business doesn't want. Consumer AI can still work through subscriptions, ads, hardware, usage limits, or on-device models.

But “frontier agents for everyone, free forever” is not a viable business model. The economics don’t make sense.

What Survives

Does usage get more valuable over time, or just more expensive?

Here’s the market map:

AI works best when it replaces labor, not when it decorates trivial workflows.

If the alternative is a database lookup, tokens look expensive.

If the alternative is a human doing repetitive judgment work, tokens look cheap.

That’s why workflows that replace labor are the clearest winners.

Code review. Support triage. Deal follow-up. Claims review. Title exception tracking. Account research. Internal reporting. Customer escalation handling.

If the baseline cost is salaried time, delay, missed revenue, or manual coordination, AI has room to work.

Customers are buying a deal record that stays current. A meeting summary they did not write. A follow-up that did not slip. A customer escalation that reached the right person before it became a churn risk.

Price the result. Manage the usage.

The customer should feel the outcome. You should feel the token math.

The best AI products will look weirdly boring under the hood: deterministic software everywhere possible, cheap models where good enough, frontier models only where judgment changes the result.

That architecture will beat “LLM on every click” on cost, speed, and often accuracy.

What Comes Next

Token costs will keep falling. That does not save bad unit economics.

Stanford’s AI Index found that the cost of querying a matching GPT-3.5 quality system fell more than 280× between late 2022 and late 2024. That's impressive. But cheap isn't the same as free.

Falling unit cost expands demand.

Jevons paradox hits hard here: when something gets cheaper, people use more of it.

Cheaper tokens will create longer contexts, more agents, more parallel runs, more always-on automations, and workflows nobody can justify today.

Total spend will keep rising because the surface area of useful AI work will expand faster than unit costs fall.

For the last generation, software felt like magic because the expensive part happened once. Build it. Copy it. Sell it. AI changes that.

Now the expensive part can happen every time the product thinks. Cheap copies were software’s free lunch. Tokens put usage back on the P&L.

The founders who do the math first will outlast the ones still pricing like copies are free.

If you’ve run real production traffic through Claude, GPT, Gemini, or any frontier model and the bill surprised you, reply and let me know. I’m collecting the horror stories - and the counterexamples - for a follow-up.

Millions of developers use Wispr Flow to dictate code, write docs, and give coding agents better context. 89% of messages sent with zero edits. 4x faster than typing. Free on Mac, Windows, and iPhone. Try Wispr Flow free.