GPU comparison
GB200 NVL72 vs GB300 NVL72
Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.
Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | GB200 NVL72:7753.2GB300 NVL72:10421.6 | GB200 NVL72:1360.9GB300 NVL72:2644.5 | GB200 NVL72:290.7GB300 NVL72:281.6 |
| Cost ($/M tok) | GB200 NVL72:$0.079GB300 NVL72:$0.070 | GB200 NVL72:$0.447GB300 NVL72:$0.277 | GB200 NVL72:$2.110GB300 NVL72:$2.617 |
| tok/s/MW | GB200 NVL72:3691993GB300 NVL72:4962665 | GB200 NVL72:648067GB300 NVL72:1259271 | GB200 NVL72:138416GB300 NVL72:134096 |
| Concurrency | GB200 NVL72:~2246GB300 NVL72:~1780 | GB200 NVL72:~226GB300 NVL72:~406 | GB200 NVL72:~27GB300 NVL72:~24 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.