AI Is Reshaping the Cost Structure of the Software Industry
Executive Summary
From Microsoft to Google, senior executives have increasingly centered their earnings discussions on token efficiency, inference costs, and overall system utilization.
This shift in language points to a deeper structural change. As software usage itself begins to incur meaningful costs, the long-held SaaS assumption that higher usage naturally leads to higher margins no longer holds universally. For software companies that lack scale, bargaining power over compute resources, or structural cost advantages, heavy users may instead become a source of persistent pressure.
Under this new structure, cloud platforms are gradually evolving into what can be described as Token Factories. They are systems designed to produce intelligence at controlled cost and stable efficiency. As a result, competitive advantage is moving away from features and models toward cost structure and efficiency management. This does not signal the end of the software industry. It does, however, suggest that the long held belief in effortlessly scalable software economics is reaching its limits.
Introduction: A Break in the Software Industry Narrative
Recent earnings calls from Microsoft and Google have revealed a structural shift that is beginning to take shape, yet remains only partially absorbed by the market. What stands out is not any single financial figure, but the language used by senior leadership.
Microsoft has explicitly referred to a Cloud and Token Factory and framed its core optimization metric around tokens per watt per dollar. Google, in parallel, has repeatedly emphasized serving unit cost, token processing efficiency, and utilization improvements, even quantifying a seventy eight percent reduction in Gemini serving unit costs. This is no longer the familiar narrative of the software industry. It reflects a form of efficiency language more commonly associated with manufacturing.
Taken together, these signals point to a deeper structural change. With AI embedded into software, the usage phase itself, the moment when queries are made, content is generated, and inference takes place, is beginning to resemble a form of production that carries real cost.
AI Blurs the Once Clear Boundary Between Software and Manufacturing
Before the rise of AI, there was a well defined structural boundary between the software industry and manufacturing.
The traditional software business followed a highly consistent economic model. Costs were largely concentrated in upfront research and development. Once a product was completed, the marginal cost of copying and distribution approached zero. How frequently or intensively customers used the product had little impact on the company’s cost structure. Under this model, heavy users typically translated into higher margins, which explains why the SaaS model was able to sustain high profitability over long periods of time.
Manufacturing followed a fundamentally different logic. After development was complete, each unit sold carried real production costs that scaled directly with shipment volume. Once a product was delivered, however, the usage phase rarely imposed additional costs on the producer.
These two models produced companies with very different value profiles, but neither was inherently superior. Each reflected a distinct cost structure and growth logic.
What AI changes is not simply that software becomes more capable. It begins to unsettle this long standing boundary. When software functionality is combined with AI, every moment of use requires the real time consumption of tokens in order to generate intelligence. Whether these tokens come from in house infrastructure, cloud services, or third party model providers, they ultimately correspond to real computation, energy, and capital expenditure.
This also helps explain why Microsoft has been explicit in stating that margin pressure stems primarily from expanding AI usage alongside rising infrastructure investment, rather than from pricing competition or weak demand. For AI enabled software, cost of goods sold no longer exists only before delivery. It continues to accrue with every instance of use.
Consider the simple act of generating a single chart. In the past, this action carried virtually no marginal cost for a software provider. In an AI enabled version, whether Copilot assists with data analysis or charts are generated automatically, each interaction requires the immediate consumption of tokens. As a result, a structural pattern that did not previously exist begins to emerge, as illustrated in Table 1.
Table 1. Cost Structure Differences Across Software, Manufacturing, and AI Enabled Software
| Category | Cost to Deliver the Product to the Customer | Cost During Actual Use |
|---|---|---|
| Traditional software | Near zero | Zero |
| Manufacturing | High | Zero |
| Software with AI | Near zero | Proportional to usage |
Heavy Users Become a Structural Pressure on SaaS
In the SaaS era, heavy users typically signaled higher value and more stable revenue. Under the structure of AI software, however, this long-standing assumption is beginning to loosen. When enterprise customers make extensive use of AI features, software companies are no longer rewarded only with stronger engagement. They also incur ongoing and material token production costs. As usage frequency rises, cost pressure rises with it.
This helps explain why Microsoft has recently focused its earnings discussions on inference efficiency and token utilization, while Google has repeatedly emphasized declines in serving unit cost and framed system-level efficiency as a core requirement for AI commercialization. These signals do not point to one-off optimization efforts. They reflect a structural response.
The design logic of the software industry is gradually shifting away from a feature-centric approach toward one centered on inference efficiency. Once software functionality becomes directly tied to real production costs, product design must confront tradeoffs between performance and cost. This is a reality that manufacturing industries have long had to manage.
Within this structure, software companies are being pushed to adopt practices that once sat outside their core capabilities. Existing products require continuous cost compression. Features must be redesigned to reduce inference costs. Relationships with upstream compute providers are evolving from simple procurement toward longer-term negotiations and joint optimization. In some cases, software firms are even participating directly in model development and post-training processes, solely to lower unit usage costs.
If this structure holds, the logic of market competition will change accordingly. The companies facing the greatest risk are not necessarily those lacking AI capabilities, but those whose cost structures cannot adapt to this shift.
One group at risk includes companies that cannot absorb token costs internally and must pass them entirely on to customers. As AI usage increases, these products are quickly perceived as mispriced relative to the value they deliver, weakening customers’ willingness to pay.
A second group consists of SaaS firms that rely heavily on power users but lack scale or bargaining leverage with compute providers. In this structure, greater customer usage leads to higher inference costs, turning what was once an advantage of strong engagement into a growing source of pressure.
A third source of risk lies with software companies whose products are highly commoditized and whose cost structures offer no clear advantage. As competition shifts toward efficiency and unit economics, products without differentiation or structural strengths are likely to be marginalized.
By contrast, the companies more likely to endure are those with a different mix of capabilities. They can actively reduce or manage token production costs, operate platform-level usage scenarios that improve token utilization, or consistently deliver meaningfully higher value within the same token budget.
Conclusion: Token Factories Become the New Upstream Layer
Whether it is Microsoft’s emphasis on tokens per watt per dollar or Google’s repeated focus on serving unit cost, the signal is striking. When two companies with very different histories, cultures, and business models place efficiency, cost, and tokens at the center of their narratives, this is no longer a matter of isolated strategic choices. It points to the emergence of a new industry structure.
AI has not reduced the value of software. What it has done is make a long overlooked reality explicit. Producing intelligence is an ongoing activity, and production inevitably involves cost. As a result, software companies can no longer speak only in terms of features and user experience. They must also confront tradeoffs among efficiency, cost, and margins.
Within this structure, cloud platforms are gradually evolving into what can be described as Token Factories. They are infrastructures designed to produce intelligence continuously, with predictable efficiency and controllable costs. In this new upstream position, the true moat is no longer defined by the model alone, but by the ability to sustain a cost structure that can withstand scale.
This is not the end of the software industry. But it may mark the end of an era. The long-held belief that software, once built, can scale profit almost effortlessly is beginning to fade.
Note: AI tools were used both to refine clarity and flow in writing, and as part of the research methodology (semantic analysis). All interpretations and perspectives expressed are entirely my own.