GPU comparison

B300 vs GB200 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B300:8611.9GB200 NVL72:11779.6	B300:1855.1GB200 NVL72:4506.3	B300:1064.2GB200 NVL72:1028.9
Cost ($/M tok)	B300:$0.075GB200 NVL72:$0.052	B300:$0.351GB200 NVL72:$0.135	B300:$0.604GB200 NVL72:$0.601
tok/s/MW	B300:3968598GB200 NVL72:5609342	B300:854865GB200 NVL72:2145869	B300:490420GB200 NVL72:489935
Concurrency	B300:~222GB200 NVL72:~943	B300:~46GB200 NVL72:~203	B300:~34GB200 NVL72:~19

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

ISL / OSL

Precision

Y-Axis Metric

GPU Config