GPU comparison

GB200 NVL72 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	GB200 NVL72:7753.2GB300 NVL72:10421.6	GB200 NVL72:1360.9GB300 NVL72:2644.5	GB200 NVL72:290.7GB300 NVL72:281.6
Cost ($/M tok)	GB200 NVL72:$0.079GB300 NVL72:$0.070	GB200 NVL72:$0.447GB300 NVL72:$0.277	GB200 NVL72:$2.110GB300 NVL72:$2.617
tok/s/MW	GB200 NVL72:3691993GB300 NVL72:4962665	GB200 NVL72:648067GB300 NVL72:1259271	GB200 NVL72:138416GB300 NVL72:134096
Concurrency	GB200 NVL72:~2246GB300 NVL72:~1780	GB200 NVL72:~226GB300 NVL72:~406	GB200 NVL72:~27GB300 NVL72:~24

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

ISL / OSL

Precision

Y-Axis Metric

GPU Config