GPU comparison
B200 vs GB200 NVL72
Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.
Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B200:2507.5GB200 NVL72:5789.6 | B200:1116.7GB200 NVL72:3987.5 | B200:757.3GB200 NVL72:817.8 |
| Cost ($/M tok) | B200:$0.232GB200 NVL72:$0.106 | B200:$0.485GB200 NVL72:$0.154 | B200:$0.715GB200 NVL72:$0.809 |
| tok/s/MW | B200:1155525GB200 NVL72:2756947 | B200:514607GB200 NVL72:1898817 | B200:349006GB200 NVL72:389416 |
| Concurrency | B200:~154GB200 NVL72:~649 | B200:~123GB200 NVL72:~333 | B200:~24GB200 NVL72:~27 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.