GPU comparison

GB200 NVL72 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB200 NVL72:7753.2GB300 NVL72:10421.6
GB200 NVL72:1360.9GB300 NVL72:2644.5
GB200 NVL72:290.7GB300 NVL72:281.6
Cost ($/M tok)
GB200 NVL72:$0.079GB300 NVL72:$0.070
GB200 NVL72:$0.447GB300 NVL72:$0.277
GB200 NVL72:$2.110GB300 NVL72:$2.617
tok/s/MW
GB200 NVL72:3691993GB300 NVL72:4962665
GB200 NVL72:648067GB300 NVL72:1259271
GB200 NVL72:138416GB300 NVL72:134096
Concurrency
GB200 NVL72:~2246GB300 NVL72:~1780
GB200 NVL72:~226GB300 NVL72:~406
GB200 NVL72:~27GB300 NVL72:~24

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.