Microsoft AI CEO Mustafa Suleyman says the AI business’s subsequent chapter will not be written by whoever builds the neatest mannequin. It will be written by whoever can afford to run one at scale. And proper now, that is a really quick record. In a submit on X, Suleyman laid out a pointy, economics-first thesis—arguing that inference compute shortage, not mannequin intelligence, will outline winners and losers for the subsequent two to a few years. The businesses with the margins to purchase tokens pull forward. Everybody else will get rationed out.“For the subsequent couple years no less than, the complete AI business goes to be outlined by this reality: demand goes to wildly outstrip provide, and so what issues is which firms / merchandise have margin to pay for tokens,” he wrote. The merchandise that may pay, he added, will enhance quickest—as a result of decrease latency drives retention, retention generates knowledge, and that knowledge spins a flywheel of mannequin enchancment and adoption.
Why inference compute, not AI mannequin coaching , is the true bottleneck in 2026
Suleyman’s argument flips the dominant AI narrative. For years, the business obsessed over coaching larger basis fashions. However the acute disaster in 2026 is on the serving facet—working these fashions for thousands and thousands of customers in actual time.Inference workloads now eat up roughly two-thirds of all AI compute spending, per Deloitte’s 2026 TMT Predictions. GPU lead instances have stretched to almost a 12 months. Excessive-bandwidth reminiscence from main suppliers is offered out by way of 2026. And of the 16 GW of worldwide data-centre capability slated for this 12 months, solely about 5 GW is definitely underneath building—the remainder stays bulletins on paper.
How Mustafa Suleyman’s AI ‘flywheel’ provides high-margin merchandise a compounding edge
This shortage is the place Suleyman’s flywheel logic takes over. Merchandise with fats gross margins—enterprise authorized instruments, healthcare SaaS, Microsoft 365 Copilot—can take up premium inference prices. That buys them decrease latency. Decrease latency retains customers coming again. Returning customers generate wealthy, proprietary workflow knowledge. That knowledge fine-tunes and improves fashions. Higher fashions drive extra adoption and income. Repeat, quicker every cycle.Suleyman has used this precise framing earlier than—on the October 2024 IA Summit, he mentioned the winners in vertical AI could be those that “nailed the fine-tuning loop” and received their knowledge flywheel spinning. Microsoft’s personal numbers again it up: paid Copilot seats hit 15 million in Q2 FY2026, up 160% year-on-year, although nonetheless simply 3.3% of the 450 million M365 industrial person base.
Shopper AI apps and low-margin AI startups face a token rationing drawback
The uncomfortable corollary is that shopper AI apps and cash-strapped startups face a squeeze. With out the margins to purchase premium inference, they get slower responses, weaker retention, and a flywheel that by no means begins spinning.Some within the thread pushed again—arguing intelligence-per-dollar issues extra, or that open-source and on-device fashions might crash inference prices fully. However Suleyman’s guess is evident and well-funded. With Microsoft pouring over $80 billion a 12 months into AI infrastructure, he is banking on the concept that for the subsequent couple of years, the enterprise that may pay for tokens wins the intelligence race first.
