GPU comparison

B300 vs GB200 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell). Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch models, sequences, precisions, and metrics — same interactions as the main inference chart.

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B300:8611.9GB200 NVL72:11779.6
B300:1855.1GB200 NVL72:4506.3
B300:1064.2GB200 NVL72:1028.9
Cost ($/M tok)
B300:$0.075GB200 NVL72:$0.052
B300:$0.351GB200 NVL72:$0.135
B300:$0.604GB200 NVL72:$0.601
tok/s/MW
B300:3968598GB200 NVL72:5609342
B300:854865GB200 NVL72:2145869
B300:490420GB200 NVL72:489935
Concurrency
B300:~222GB200 NVL72:~943
B300:~46GB200 NVL72:~203
B300:~34GB200 NVL72:~19

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.