Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

New to the terminology? Browse the AI inference glossary.

All amd announcement b200 b300 benchmark cann deepseek disagg dynamo fp4 gb200 gb300 glm5 gpu h100 h200 huawei inference kimi mi355x minimax nvfp4 nvidia nvl72 qwen rocm sglang trtllm vllm wide-ep

May 26, 2026·14 min read

B200 NVFP4 vs H100 FP8 on MiniMax-M2.5: Up to 8.2x Better Performance per Dollar with vLLM

vLLM PR #36307 unlocks the trtllm-gen FP8 MoE kernel for MiniMax on B200; combined with NVFP4, perf/$ scales from 4.0x at 22 tok/s/user to 8.2x at 110 on 8K/1K

benchmarkgpuinferenceminimaxnvidiab200h100vllmfp4