InferenceXbySemiAnalysis logo
HomeDashboardSupportersArticlesAbout
Star939

Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

Allamdannouncementb200benchmarkdeepseekfp4gb200gpuinferencekimimi355xnvidianvl72rocmsglangvllmwide-ep
April 22, 2026·7 min read

AMD MI355X Kimi K2.5 Inference: 7.7x Throughput, Up To 15x Interactivity in 25 Days on vLLM

vLLM PR #35850 Fixed AITER MLA Dispatch on MI355X CDNA4, Unlocking Kimi K2.5 Inference Performance at TP=8, Shipped in vLLM 0.18

benchmarkgpuinferencekimiamdvllmrocmmi355x
SemiAnalysis logo

Continuous open-source inference benchmarking. Real-world, reproducible, auditable performance data trusted by trillion dollar AI infrastructure operators like OpenAI, Meta, Oracle, Microsoft, etc.

SemiAnalysisMain SiteNewsletterAbout
LegalLand AcknowledgementPrivacy PolicyCookie Policy
ContributeBenchmarksFrontend
MoreGPU Reliability

If this data helps your work, consider starring us on GitHub or sharing with your network.

© 2026 semianalysis.com. All rights reserved.