InferenceXbySemiAnalysis logo
HomeDashboardSupportersArticlesAbout
Star939

Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

Allamdannouncementb200benchmarkdeepseekfp4gb200gpuinferencekimimi355xnvidianvl72rocmsglangvllmwide-ep
April 23, 2026·7 min read

GB200 NVL72 vs B200 on Kimi K2.5: 3.1x from Wide EP vLLM

Rack scale NVLink on NVL72 lets Dynamo vLLM run Kimi K2.5 wide EP up to Decode EP 16, taking peak throughput from 4,021 to 12,587 tok/s/GPU on 8k/1k NVFP4

benchmarkgpuinferencekiminvidiagb200b200vllmnvl72wide-ep
SemiAnalysis logo

Continuous open-source inference benchmarking. Real-world, reproducible, auditable performance data trusted by trillion dollar AI infrastructure operators like OpenAI, Meta, Oracle, Microsoft, etc.

SemiAnalysisMain SiteNewsletterAbout
LegalLand AcknowledgementPrivacy PolicyCookie Policy
ContributeBenchmarksFrontend
MoreGPU Reliability

If this data helps your work, consider starring us on GitHub or sharing with your network.

© 2026 semianalysis.com. All rights reserved.