·13 min read
B200 NVFP4 vs H200 INT4 on Kimi K2.5/K2.6: Up to 2.95x Better Performance per Dollar
On vLLM 8K/1K the NVFP4 path on B200 is 2.71x–2.95x cheaper per million tokens than H200 INT4 across the entire 30–90 tok/s/user serving band, and 2.45x–2.74x cheaper than B200 INT4 on the same silicon. Both factors decompose cleanly into B200's HBM bandwidth, HBM capacity, and NVFP4 tensor cores
benchmarkgpuinferencekiminvidiab200h200vllmnvfp4