DeepSeek V4 Pro 1.6T · GPU comparison

DeepSeek V4 Pro 1.6T — B300 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 / GB300 NVL72 on DeepSeek V4 Pro 1.6T at 53 tok/s/user: 1866 / 7062 tok/s/GPU, $0.35 / $0.10 per million tokens. GB300 NVL72 is 235% cheaper per token; GB300 NVL72 delivers 278% more tok/s/GPU.

Around the middle of the 13–173 tok/s/user interactivity band, at 93 tok/s/user on DeepSeek V4 Pro 1.6T: B300 runs 1051 tok/s/GPU at $0.62/M tokens, GB300 NVL72 runs 2209 at $0.33/M. GB300 NVL72 is 87% cheaper per token; GB300 NVL72 delivers 110% more tok/s/GPU.

Setting 133 tok/s/user as the target on DeepSeek V4 Pro 1.6T, B300 produces 572 tok/s/GPU ($1.13 per million tokens) and GB300 NVL72 produces 750 ($0.97). GB300 NVL72 is 16% cheaper per token; GB300 NVL72 delivers 31% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B300:1866.2GB300 NVL72:7061.7
B300:1050.7GB300 NVL72:2209.0
B300:571.7GB300 NVL72:750.1
Cost ($/M tok)
B300:$0.348GB300 NVL72:$0.104
B300:$0.619GB300 NVL72:$0.330
B300:$1.125GB300 NVL72:$0.971
tok/s/MW
B300:859991GB300 NVL72:3362729
B300:484198GB300 NVL72:1051895
B300:263453GB300 NVL72:357203
Concurrency
B300:~18GB300 NVL72:~768
B300:~5GB300 NVL72:~233
B300:~2GB300 NVL72:~35

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.