InferenceXbySemiAnalysis logo
HomeDashboardSupportersArticlesAbout
Star939

Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

Allamdannouncementb200benchmarkdeepseekfp4gb200gpuinferencekimimi355xnvidianvl72rocmsglangvllmwide-ep
May 2, 2026·5 min read

SGLang 0.5.6 on B200 DeepSeek R1 FP4: Up to 1.8x at Low Concurrency

Piecewise CUDA graphs for DeepSeek V3, a unified event loop, and JIT kernels push 8k/1k throughput from 508 to 907 tok/s/GPU on the same 16 GPU B200 pool

benchmarkinferencegpunvidiab200deepseeksglangfp4
SemiAnalysis logo

Continuous open-source inference benchmarking. Real-world, reproducible, auditable performance data trusted by trillion dollar AI infrastructure operators like OpenAI, Meta, Oracle, Microsoft, etc.

SemiAnalysisMain SiteNewsletterAbout
LegalLand AcknowledgementPrivacy PolicyCookie Policy
ContributeBenchmarksFrontend
MoreGPU Reliability

If this data helps your work, consider starring us on GitHub or sharing with your network.

© 2026 semianalysis.com. All rights reserved.