Home Dashboard Comparisons Supporters Datasets Articles About

Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

New to the terminology? Browse the AI inference glossary.

All amd announcement b200 b300 benchmark cann deepseek disagg dynamo fp4 gb200 gb300 glm5 gpu h100 h200 huawei inference kimi mi355x minimax nvfp4 nvidia nvl72 qwen rocm sglang trtllm vllm wide-ep

May 25, 2026·7 min read

AMD MI355X Qwen3.5 397B-A17B Inference: Up to 19x Throughput per GPU in 3 Months on SGLang FP8

From v0.5.8 (Feb) → v0.5.10rc0 (Apr) → v0.5.12 (May), three AITER kernel landings on MI355X plus a TP=8 → TP=2/TP=4 retune push Qwen3.5 8k/1k peak from 1.3k to 6.4k tok/s/GPU and extend the curve out to 75 tok/s/user

benchmarkgpuinferenceqwenamdmi355xsglangrocm

Continuous open-source inference benchmarking. Real-world, reproducible, auditable performance data trusted by trillion dollar AI infrastructure operators like OpenAI, Meta, Oracle, Microsoft, etc.

SemiAnalysisMain Site Newsletter About

LegalLand Acknowledgement Privacy Policy Cookie Policy

ContributeBenchmarks Frontend

MoreGPU Reliability Performance per Dollar AI Inference Glossary 中文版

If this data helps your work, consider starring us on GitHub or sharing with your network.