·12 min read
B200 NVFP4 vs H200 FP8 on GLM-5: Up to 3.65x Better Performance per Dollar with SGLang MTP
Both SKUs run SGLang EAGLE MTP; the Blackwell generation lifts perf/$ by ~1.2x at the peak and the NVIDIA GLM-5-NVFP4 checkpoint on FlashInfer TRT-LLM sparse MLA stacks another ~2.4–3.0x on 8K/1K
benchmarkgpuinferenceglm5nvidiab200h200sglangfp4