GPU comparison

B200 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:1072.9GB300 NVL72:5561.8
B200:368.4GB300 NVL72:1016.6
B200:160.2GB300 NVL72:121.9
Cost ($/M tok)
B200:$0.505GB300 NVL72:$0.132
B200:$1.476GB300 NVL72:$0.706
B200:$3.382GB300 NVL72:$6.176
tok/s/MW
B200:494412GB300 NVL72:2648482
B200:169759GB300 NVL72:484103
B200:73803GB300 NVL72:58057
Concurrency
B200:~88GB300 NVL72:~1023
B200:~11GB300 NVL72:~216
B200:~21GB300 NVL72:~20

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.