GPU comparison

B200 vs GB200 NVL72

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:2507.5GB200 NVL72:5789.6
B200:1116.7GB200 NVL72:3987.5
B200:757.3GB200 NVL72:817.8
Cost ($/M tok)
B200:$0.232GB200 NVL72:$0.106
B200:$0.485GB200 NVL72:$0.154
B200:$0.715GB200 NVL72:$0.809
tok/s/MW
B200:1155525GB200 NVL72:2756947
B200:514607GB200 NVL72:1898817
B200:349006GB200 NVL72:389416
Concurrency
B200:~154GB200 NVL72:~649
B200:~123GB200 NVL72:~333
B200:~24GB200 NVL72:~27

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.