DeepSeek V4 Pro 1.6T · GPU comparison

DeepSeek V4 Pro 1.6T — B300 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 / GB300 NVL72 on DeepSeek V4 Pro 1.6T at 72 tok/s/user: 2388 / 9384 tok/s/GPU, $0.27 / $0.08 per million tokens. GB300 NVL72 is 250% cheaper per token; GB300 NVL72 delivers 293% more tok/s/GPU.

Around the middle of the 15–244 tok/s/user interactivity band, at 130 tok/s/user on DeepSeek V4 Pro 1.6T: B300 runs 1228 tok/s/GPU at $0.53/M tokens, GB300 NVL72 runs 3474 at $0.22/M. GB300 NVL72 is 143% cheaper per token; GB300 NVL72 delivers 183% more tok/s/GPU.

Setting 187 tok/s/user as the target on DeepSeek V4 Pro 1.6T, B300 produces 542 tok/s/GPU ($1.23 per million tokens) and GB300 NVL72 produces 502 ($1.44). B300 is 17% cheaper per token; B300 delivers 8% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B300:2387.9GB300 NVL72:9384.4	B300:1227.9GB300 NVL72:3473.9	B300:541.8GB300 NVL72:501.8
Cost ($/M tok)	B300:$0.275GB300 NVL72:$0.078	B300:$0.526GB300 NVL72:$0.216	B300:$1.228GB300 NVL72:$1.436
tok/s/MW	B300:1100395GB300 NVL72:4468745	B300:565849GB300 NVL72:1654219	B300:249686GB300 NVL72:238976
Concurrency	B300:~21GB300 NVL72:~1163	B300:~5GB300 NVL72:~288	B300:~1GB300 NVL72:~13

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: