·7 min read
GB200 NVL72 vs B200 on Kimi K2.5: 3.1x from Wide EP vLLM
Rack scale NVLink on NVL72 lets Dynamo vLLM run Kimi K2.5 wide EP up to Decode EP 16, taking peak throughput from 4,021 to 12,587 tok/s/GPU on 8k/1k NVFP4
benchmarkgpuinferencekiminvidiagb200b200vllmnvl72wide-ep