InferenceXbySemiAnalysis logo
HomeDashboardComparisonsSupportersArticlesAbout
Star1,130

Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

Allamdannouncementb200b300benchmarkcanndeepseekdisaggdynamofp4gb200gb300glm5gpuh100h200huaweiinferencekimimi355xminimaxnvfp4nvidianvl72qwenrocmsglangtrtllmvllmwide-ep
May 27, 2026·11 min read

GB300 NVL72 vs GB200 NVL72 Inference Performance & Perf per Dollar - on DeepSeek-V4-Pro 1.6T: Up to 2.83x Throughput

DSv4-Pro FP4 8K/1K, Dynamo+vLLM, disaggregated on both racks. GB300's 50% extra HBM (288 vs 192 GB/GPU) unlocks a wider prefill+decode recipe GB200 can't fit — lifting middle-of-curve perf/$ by 2.31x despite a 20% per-GPU TCO premium.

benchmarkgpuinferencedeepseeknvidiagb300gb200nvl72vllmdynamowide-epdisagg
May 23, 2026·10 min read

GB200 NVL72 vs B200 on DeepSeek R1 670B: Up to 4.4x Throughput per GPU at 125 tok/s/user

DeepSeek R1 FP4 1k/1k. NVL72's 72-GPU NVLink scale-up fabric lets decode run wide EP up to EP=32, where B200's 8-GPU NVLink island caps out at EP=8 over RoCEv2

benchmarkgpuinferencedeepseeknvidiagb200b200nvl72trtllmdynamowide-epdisagg
April 23, 2026·7 min read

GB200 NVL72 vs B200 on Kimi K2.5: 3.1x from Wide EP vLLM

Rack scale NVLink on NVL72 lets Dynamo vLLM run Kimi K2.5 wide EP up to Decode EP 16, taking peak throughput from 4,021 to 12,587 tok/s/GPU on 8k/1k NVFP4

benchmarkgpuinferencekiminvidiagb200b200vllmnvl72wide-ep
SemiAnalysis logo

Continuous open-source inference benchmarking. Real-world, reproducible, auditable performance data trusted by trillion dollar AI infrastructure operators like OpenAI, Meta, Oracle, Microsoft, etc.

SemiAnalysisMain SiteNewsletterAbout
LegalLand AcknowledgementPrivacy PolicyCookie Policy
ContributeBenchmarksFrontend
MoreGPU ReliabilityPerformance per Dollar

If this data helps your work, consider starring us on GitHub or sharing with your network.

© 2026 semianalysis.com. All rights reserved.