GPU comparison

B200 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B200:1072.9GB300 NVL72:5561.8	B200:368.4GB300 NVL72:1016.6	B200:160.2GB300 NVL72:121.9
Cost ($/M tok)	B200:$0.505GB300 NVL72:$0.132	B200:$1.476GB300 NVL72:$0.706	B200:$3.382GB300 NVL72:$6.176
tok/s/MW	B200:494412GB300 NVL72:2648482	B200:169759GB300 NVL72:484103	B200:73803GB300 NVL72:58057
Concurrency	B200:~88GB300 NVL72:~1023	B200:~11GB300 NVL72:~216	B200:~21GB300 NVL72:~20

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

ISL / OSL

Precision

Y-Axis Metric

GPU Config