Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

New to the terminology? Browse the AI inference glossary.

All amd announcement b200 b300 benchmark cann deepseek disagg dynamo fp4 gb200 gb300 glm5 gpu h100 h200 huawei inference kimi mi355x minimax nvfp4 nvidia nvl72 qwen rocm sglang trtllm vllm wide-ep

May 26, 2026·12 min read

B200 NVFP4 vs H200 FP8 on GLM-5: Up to 3.65x Better Performance per Dollar with SGLang MTP

Both SKUs run SGLang EAGLE MTP; the Blackwell generation lifts perf/$ by ~1.2x at the peak and the NVIDIA GLM-5-NVFP4 checkpoint on FlashInfer TRT-LLM sparse MLA stacks another ~2.4–3.0x on 8K/1K

benchmarkgpuinferenceglm5nvidiab200h200sglangfp4

May 25, 2026·8 min read

AMD MI355X GLM-5 Inference: Up to 40% Cheaper per Million Tokens than B200 on SGLang FP8

14 weeks after GLM-5 launched, AMD landed both MTP and non-MTP SGLang FP8 recipes on MI355X — fused MLA + FP8 KV cache via TileLang flips the single-node FP8 cost curve in AMD favor across most of the performance Pareto

benchmarkgpuinferenceglm5amdnvidiami355xb200sglangrocm