DeepSeek V4 Pro 1.6T — B300 vs GB300 NVL72
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
B300 / GB300 NVL72 on DeepSeek V4 Pro 1.6T at 53 tok/s/user: 1866 / 7062 tok/s/GPU, $0.35 / $0.10 per million tokens. GB300 NVL72 is 235% cheaper per token; GB300 NVL72 delivers 278% more tok/s/GPU.
Around the middle of the 13–173 tok/s/user interactivity band, at 93 tok/s/user on DeepSeek V4 Pro 1.6T: B300 runs 1051 tok/s/GPU at $0.62/M tokens, GB300 NVL72 runs 2209 at $0.33/M. GB300 NVL72 is 87% cheaper per token; GB300 NVL72 delivers 110% more tok/s/GPU.
Setting 133 tok/s/user as the target on DeepSeek V4 Pro 1.6T, B300 produces 572 tok/s/GPU ($1.13 per million tokens) and GB300 NVL72 produces 750 ($0.97). GB300 NVL72 is 16% cheaper per token; GB300 NVL72 delivers 31% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:1866.2GB300 NVL72:7061.7 | B300:1050.7GB300 NVL72:2209.0 | B300:571.7GB300 NVL72:750.1 |
| Cost ($/M tok) | B300:$0.348GB300 NVL72:$0.104 | B300:$0.619GB300 NVL72:$0.330 | B300:$1.125GB300 NVL72:$0.971 |
| tok/s/MW | B300:859991GB300 NVL72:3362729 | B300:484198GB300 NVL72:1051895 | B300:263453GB300 NVL72:357203 |
| Concurrency | B300:~18GB300 NVL72:~768 | B300:~5GB300 NVL72:~233 | B300:~2GB300 NVL72:~35 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.