GPU comparison

B200 vs GB200 NVL72

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B200:2507.5GB200 NVL72:5789.6	B200:1116.7GB200 NVL72:3987.5	B200:757.3GB200 NVL72:817.8
Cost ($/M tok)	B200:$0.232GB200 NVL72:$0.106	B200:$0.485GB200 NVL72:$0.154	B200:$0.715GB200 NVL72:$0.809
tok/s/MW	B200:1155525GB200 NVL72:2756947	B200:514607GB200 NVL72:1898817	B200:349006GB200 NVL72:389416
Concurrency	B200:~154GB200 NVL72:~649	B200:~123GB200 NVL72:~333	B200:~24GB200 NVL72:~27

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

ISL / OSL

Precision

Y-Axis Metric

GPU Config