Beyond the Traditional Hardware Framework: AI Infrastructure Is Forming a New System Cycle
Executive Summary
This AI infrastructure cycle may not be fully understood through the traditional framework of technology hardware cycles.
First, although the market keeps discussing an AI bubble, these early bubble warnings may actually make suppliers more disciplined about capital spending and delay the point at which supply becomes excessive.
Second, KV cache is changing the architecture of AI infrastructure. AI infrastructure is no longer only a question of supply and demand for individual hardware components. It has become a system-efficiency problem shaped by compute, memory, storage, networking, power, and cooling. Bottlenecks may move across different layers of the system.
Third, this cycle of AI demand does not come entirely from mature end demand. Early buildouts by platform companies, capability competition among model companies, experimental demand from AI startups, expansion supported by capital markets, and strategic and defensive demand are all contributing to the growth of AI infrastructure.
As a result, this AI infrastructure cycle may be neither as simple as a traditional hardware cycle nor merely a straightforward long-term growth story. It looks more like a complex cycle shaped by supply discipline, platform competition, system architecture, and the validation of end demand. What deserves attention is which layer of the system becomes imbalanced first, and which layer may become the next supporting layer.
Introduction
When the market talks about AI infrastructure, the discussion often centers on a few major questions.
- Will GPUs remain in short supply?
- Will HBM capacity be enough?
- Can advanced packaging keep up?
- Will cloud platforms continue to raise capital spending?
- Is this AI infrastructure cycle already approaching bubble territory?
These are all important questions. Yet in some ways, they still frame AI infrastructure through the lens of past technology cycles.
In a traditional technology cycle, the market usually looks for a familiar pattern. Demand rises quickly, supply fails to keep up, prices increase, profits improve, companies begin to expand capacity, and once new capacity comes online, supply and demand reverse. This pattern has appeared repeatedly in DRAM, NAND, LCDs, solar, and many component industries.
But this AI infrastructure cycle may differ in several ways. The way this cycle forms may be more complex than past cycles. When demand sources, supply behavior, and system architecture are all changing, perhaps the more important question is not whether AI infrastructure will have a cycle, but what form this cycle will take.
Reversals in Traditional Cycles Usually Come From Collective Capacity Expansion
To understand why this AI infrastructure cycle is difficult to judge, it helps to return to the logic of traditional cyclical industries. In most cyclical industries, the real trigger for reversal is often not weakening demand, but too much supply expansion during the strongest part of the cycle.
- When products are in short supply, customers ask suppliers to add capacity.
- When prices rise, management teams see stronger margins and profits.
- When profits improve, boards and shareholders want companies to seize the opportunity.
- When competitors begin to expand capacity, other companies worry about losing market share.
As a result, even though each company understands cyclical risk, rational discipline can gradually be pushed aside at the peak of the cycle by external pressure, competitive pressure, and optimism. In the end, several leading companies begin to make similar decisions. They expand capacity one after another. They all believe demand will continue to grow. They all feel that if they do not expand, they may miss the opportunity.
The real problem is usually not one company expanding capacity. It is an entire industry making similar decisions around the same time. When this collectively expanded capacity comes online, the supply-demand relationship begins to change. Products that were once in short supply become less scarce. Tight pricing begins to loosen. Companies that were once highly favored by the market begin to face inventory, pricing, and margin pressure.
This is the most typical reversal pattern in traditional technology cycles. Therefore, if the market wants to judge whether AI infrastructure is about to enter a traditional oversupply cycle, the real question is not only whether demand remains strong. It is whether the supply side has already entered a phase of synchronized, visible, and large-scale capacity expansion.
From this perspective, the questions become more specific.
- Has GPU capacity already expanded out of control?
- Has HBM supply already moved into clear oversupply?
- Have DRAM and NAND fully restarted the kind of peak-cycle expansion seen in past upturns?
- Have SSD and HDD suppliers begun making large-scale capacity investments for AI demand?
- Has advanced process and advanced packaging capacity expanded beyond visible demand?
If these areas have all entered synchronized, visible, and large-scale expansion, then the risk of a traditional cycle reversal would naturally rise.
What is more subtle today is that although the market keeps discussing an AI bubble, the supply chain may not yet have entered the kind of broad optimism, broad expansion, and broad loss of discipline seen in past cycles. In the past, bubbles often formed in an atmosphere of widespread optimism. This time, bubble warnings have appeared almost from the beginning.
In other words, bubble warnings may not only weigh on market valuations. They may also make it harder for suppliers to convince themselves and their boards to pursue uncontrolled capacity expansion. This does not mean AI infrastructure has no bubble risk. It means the supply response in this cycle may not be as direct as in past cycles. What may deserve closer attention is whether these market discussions are changing suppliers’ capital spending discipline.
This is one reason why this AI infrastructure cycle is more complex.
KV Cache Turns AI Infrastructure Into a System-Efficiency Problem
If we understand this AI infrastructure cycle only through supply discipline, the picture is still incomplete. The architecture of AI infrastructure itself is changing. In the past, we were used to looking at different hardware industries separately. Although these industries all sit inside the data center, the market often still analyzes them through their own product supply and demand.
What makes KV cache especially interesting is that it is not data storage in the traditional sense. It is a form of data that sits between computation and storage.
- It can be preserved, allowing the model to avoid repeated computation.
- It can also be discarded, because the original data can be recomputed.
- It requires speed and capacity, but it may not need the same durability and fault-tolerance design as traditional enterprise data.
This makes KV cache a new kind of infrastructure demand.
In AI inference, especially in long-context, multi-turn conversation, agentic workflow, and large-scale data retrieval scenarios, KV cache becomes increasingly important.
- If KV cache is preserved, it can reduce repeated GPU computation.
- If it is not preserved, more GPU compute is needed to recompute it.
- If HBM is too expensive or too limited, other memory and storage layers may need to help absorb the workload.
- If flash storage and SSDs can provide a good enough balance of speed, capacity, and cost, they may become a new middle tier in AI factories.
This changes the problem itself. The efficiency of AI infrastructure no longer depends only on how many GPUs are available. It also depends on how compute, memory, and the storage hierarchy are configured.
- When compute is insufficient, more storage can be used to reduce recomputation.
- When storage is insufficient, more compute can be used to make up for it.
- When high-bandwidth memory is too expensive, the system needs to decide which data should remain in HBM and which data can move to other layers.
- Flash storage, SSDs, and larger-capacity storage layers are no longer only about storing data. They are becoming part of the efficiency design of AI inference.
This is why KV cache changes more than storage demand. It allows compute, memory, and storage, which were once treated separately, to become part of a system of substitution, support, and shifting bottlenecks.
In other words, compute, memory, and storage are no longer independent resources. They are becoming part of the same AI factory efficiency system. This means AI infrastructure is no longer only about the separate supply-demand cycles of GPUs, HBM, NAND, or SSDs. It has become a system-efficiency problem that spans compute, memory, storage, networking, power, and cooling.
This also makes it harder to judge the AI infrastructure cycle. The bottleneck does not always have to remain in GPUs. It may move to HBM, flash storage, SSDs, networking, power, or cooling. If the market looks only at whether one product is in short supply, it may miss the more important change. The real question is where the efficiency bottleneck of the entire AI factory appears, and at which layer it will be reconfigured.
Demand Sources Are Not Only Mature End Demand
Beyond supply behavior and system architecture, this AI infrastructure cycle has another important characteristic.
Its demand sources are also unusual. In traditional technology cycles, demand is usually easier to understand as end-product demand.
- Consumers buy smartphones, and device brands place orders for components.
- Enterprises buy servers, and the supply chain receives orders.
- Display demand comes from TVs, laptops, and monitors.
- Memory demand comes from PCs, smartphones, servers, and other electronic products.
There may be inventory buildup and pull-in demand along the way, but in the end, demand can still be traced more easily to end products and end usage.
This cycle of AI infrastructure demand is not quite the same. A large portion comes from early buildouts by platform companies. Large cloud and platform companies are not waiting until all end-user demand has fully matured before building AI capacity. They are acting more as if they are in an uncertain but highly important transition period, where they first need to make sure they are not absent.
- They worry that without enough compute, they will not be able to support model capabilities.
- They worry that without enough infrastructure, they may lose enterprise customers.
- They worry that without enough AI capacity, they may fall behind in developer ecosystems and platform entry points.
- They worry that if competitors build scale first, it will be difficult to catch up later.
As a result, platform demand has both strategic and defensive characteristics. It is not only about how many end customers are willing to pay today. It is also a form of insurance for future platform position.
Demand from AI startups has similar characteristics. Many AI startups need large amounts of compute to train models, test products, support inference, and serve customers. They also need technical capability and usage growth to maintain capital market trust.
These demands are real, because they do consume GPUs, memory, storage, and cloud capacity. But they may not all have been validated by mature, stable, and predictable end-user cash flow.
In other words, current AI infrastructure demand is a mix of different types of demand.
- Some comes from real usage that already exists.
- Some comes from platform companies building ahead to avoid being absent.
- Some comes from model companies increasing compute to maintain capability competition.
- Some comes from experimental demand by AI startups supported by capital markets.
- Some comes from broader strategic and national security considerations.
But they cannot all be treated as mature end demand. This is one of the hardest parts of judging the AI infrastructure cycle. Demand is real, but its nature is not singular. Demand is strong, but a meaningful portion still carries the characteristics of early buildout and strategic positioning. Demand will translate into actual orders, but whether it can eventually become stable usage and cash flow still needs to be observed.
So this AI infrastructure cycle is not simply a question of whether demand is real or whether demand is a bubble. More precisely, it is a mixed-demand structure. This mixed-demand structure can support infrastructure expansion, but it also makes it harder for the market to judge how mature end demand really is.
Conclusion
The traditional cyclical view reminds us that oversupply usually comes from collective capacity expansion. What is different this time is that early bubble warnings may be restraining suppliers from expanding out of control.
The architectural view changes the question further. As KV cache allows compute, memory, and storage to support one another, the bottleneck in AI infrastructure no longer has to remain in GPUs. It may move across the entire system.
Demand sources also make the cycle harder to judge. Platform companies and AI startups are creating real demand, but this demand does not all come from mature end demand. Part of it is usage demand. Part of it is strategic demand. Part of it is defensive demand. Part of it is experimental demand supported by capital markets.
As a result, this AI infrastructure cycle may be neither as simple as a traditional hardware cycle nor merely a straightforward long-term growth story. What deserves attention is not whether one product remains in short supply, but which layer develops a bottleneck first, which layer expands too much first, and whether this demand can move from early buildout to stable usage.
AI infrastructure may still have cycles. But the unit of this cycle may no longer be individual products such as GPUs, HBM, NAND, or SSDs. It may be a system-level cycle formed by compute, memory, storage, platform demand, and capital discipline.
This also means that judging this AI infrastructure cycle should not be only about whether one component is in short supply. It should be about which layer of the whole system becomes imbalanced first, and which layer may become the next supporting layer.