Microsoft announces new Maia 200 AI chip to power Azure cloud services
7 days ago • ai-infrastructure
Microsoft announced the Maia 200 AI inference accelerator on January 26, 2026. It is the company's second-generation custom silicon for Azure data centers. Built on TSMC's 3nm process with over 140 billion transistors, Maia 200 includes native FP4/FP8 tensor cores. It delivers over 10 petaFLOPS at FP4 and more than 5 petaFLOPS at FP8.
The chip pairs 216 GB of HBM3e memory at 7 TB/s with 272 MB of on-chip SRAM and has a 750 W TDP. Microsoft says Maia 200 achieves three times the FP4 performance of AWS Trainium (3rd gen), exceeds Google's TPU v7 on FP8, and offers 30% better performance per dollar than Microsoft's current fleet. Deployments began in Azure US Central, with US West 3 next. An SDK preview is available for developers.
Maia 200 targets large-model inference, powering workloads such as OpenAI's GPT-5.2, Microsoft 365 Copilot, and synthetic-data generation. It uses Ethernet-based scaling with 2.8 TB/s bidirectional bandwidth per accelerator and supports clusters up to 6,144 units. That design reduces dependence on Nvidia fabrics and lowers total cost of ownership.
Why It Matters
- Azure operators can deploy inference clusters with ~30% better performance per dollar and higher token throughput by using FP4/FP8-optimized accelerators.
- High-bandwidth memory (216 GB HBM3e at 7 TB/s) supports large LLMs like GPT-5.2 with steadier latency during peak traffic.
- Developers can use the Maia SDK preview to port PyTorch/Triton models and scale inference cost-effectively without proprietary networking.
- Ethernet-based scaling to 6,144 accelerators simplifies infrastructure compared with custom fabrics, reducing complexity and TCO.
Trust & Verification
Source List (5)
Sources
- Microsoft BlogOfficialJan 26, 2026
- TechCrunchTier-1Jan 26, 2026
- BloombergTier-1Jan 26, 2026
- GeekWireTier-1Jan 26, 2026
- ForbesOtherJan 26, 2026
Fact Checks (5)
216 GB HBM3e at 7 TB/s; 750 W TDP (VERIFIED)