GPU comparison
B200 vs GB300 NVL72
Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.
Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B200:1072.9GB300 NVL72:5561.8 | B200:368.4GB300 NVL72:1016.6 | B200:160.2GB300 NVL72:121.9 |
| Cost ($/M tok) | B200:$0.505GB300 NVL72:$0.132 | B200:$1.476GB300 NVL72:$0.706 | B200:$3.382GB300 NVL72:$6.176 |
| tok/s/MW | B200:494412GB300 NVL72:2648482 | B200:169759GB300 NVL72:484103 | B200:73803GB300 NVL72:58057 |
| Concurrency | B200:~88GB300 NVL72:~1023 | B200:~11GB300 NVL72:~216 | B200:~21GB300 NVL72:~20 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.