InferenceXbySemiAnalysis logo
HomeDashboardComparisonsSupportersArticlesAbout
Star1,029

Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

Allamdannouncementb200benchmarkdeepseekdisaggdynamofp4gb200gb300glm5gpuh100h200inferencekimimi355xminimaxnvfp4nvidianvl72qwenrocmsglangtrtllmvllmwide-ep
May 26, 2026·14 min read

B200 NVFP4 vs H100 FP8 on MiniMax-M2.5: Up to 8.2x Better Performance per Dollar with vLLM

vLLM PR #36307 unlocks the trtllm-gen FP8 MoE kernel for MiniMax on B200; combined with NVFP4, perf/$ scales from 4.0x at 22 tok/s/user to 8.2x at 110 on 8K/1K

benchmarkgpuinferenceminimaxnvidiab200h100vllmfp4
SemiAnalysis logo

Continuous open-source inference benchmarking. Real-world, reproducible, auditable performance data trusted by trillion dollar AI infrastructure operators like OpenAI, Meta, Oracle, Microsoft, etc.

SemiAnalysisMain SiteNewsletterAbout
LegalLand AcknowledgementPrivacy PolicyCookie Policy
ContributeBenchmarksFrontend
MoreGPU ReliabilityPerformance per Dollar

If this data helps your work, consider starring us on GitHub or sharing with your network.

© 2026 semianalysis.com. All rights reserved.