San Francisco, January 26, 2026, 09:12 PST
- Microsoft unveiled Maia 200, its second-generation in-house AI inference chip, and new software tools to program it
- The first deployments start this week in Iowa, with Arizona next, the company said
- Microsoft is pitching Maia 200 for running OpenAI’s latest GPT-5.2 models and other services inside Azure
Microsoft on Monday unveiled Maia 200, the second generation of its in-house artificial intelligence chip, alongside software tools it said are meant to erode Nvidia’s advantage with developers. (Reuters)
The move matters now because the cost of running generative AI systems is rising fast, and cloud providers are trying to control both supply and pricing for the hardware that powers them. Nvidia still dominates AI computing, in part because many developers build around its CUDA software platform.
Microsoft is aiming Maia 200 at “inference” — the stage where a trained model produces answers — rather than training. Inference can become the daily grind cost for chatbots and assistants because they generate output one token at a time, with tokens being small chunks of text.
The company said Maia 200 will come online this week in a data center near Des Moines, Iowa, with a second site near Phoenix, Arizona planned next. It is the follow-on to Maia 100, which Microsoft introduced in 2023.
In a blog post, Microsoft said Maia 200 is built on TSMC’s 3-nanometer process and is designed to cut the cost of “AI token generation.” It said the chip uses 216GB of HBM3e (high-bandwidth memory used to feed data to the processor) and 272MB of on-chip SRAM, a faster memory type that can help when many users hit a model at once. Microsoft also said Maia 200 has more than 140 billion transistors and delivers over 10 petaFLOPS in FP4 and over 5 petaFLOPS in FP8 — lower-precision math formats often used to speed up AI workloads. (The Official Microsoft Blog)
Microsoft compared the new chip directly with rivals this time. It said Maia 200 delivers about three times the FP4 performance of Amazon’s third-generation Trainium and better FP8 performance than Google’s seventh-generation TPU, and claimed a 30% gain in performance per dollar versus the latest hardware in its fleet.
A big part of the announcement was software. Microsoft said it will offer a Maia software development kit that integrates with PyTorch, a widely used AI framework, and includes a Triton compiler and kernel library. Triton is an open-source tool with major contributions from OpenAI, and Microsoft positioned it as an alternative route to the kind of low-level optimisations developers often do with CUDA.
Scott Guthrie, executive vice president of Microsoft’s Cloud and AI division, said Maia 200 can “run today’s largest models” with room to grow. Microsoft said it will use Maia 200 to host OpenAI’s GPT-5.2 model and other systems for Microsoft Foundry and Microsoft 365 Copilot. (The Verge)
The chip puts Microsoft more squarely in the same lane as Amazon and Google, which have been pushing their own in-house AI processors for cloud customers. Nvidia, meanwhile, is preparing its next “Vera Rubin” platform, and Microsoft’s chip uses an older generation of high-bandwidth memory than Nvidia has signaled for its upcoming parts.
But the hard part is rarely the silicon. Developers have years of code and tools built around CUDA, and Microsoft will need to show that its Triton-based stack is stable, fast and easy to move workloads onto at scale. The rollout is also starting in limited regions, so capacity — not just performance claims — will be watched.
Microsoft said Maia 200 will support multiple models, including GPT-5.2, and that its Superintelligence team will use the chips for synthetic data generation and reinforcement learning as it works on its own next-generation models.