DeepSeek V4 Pro 1.6T — B300 vs GB300 NVL72
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
B300 / GB300 NVL72 on DeepSeek V4 Pro 1.6T at 54 tok/s/user: 1841 / 11018 tok/s/GPU, $0.35 / $0.07 per million tokens. GB300 NVL72 is 430% cheaper per token; GB300 NVL72 delivers 499% more tok/s/GPU.
Around the middle of the 13–177 tok/s/user interactivity band, at 95 tok/s/user on DeepSeek V4 Pro 1.6T: B300 runs 1019 tok/s/GPU at $0.64/M tokens, GB300 NVL72 runs 6320 at $0.12/M. GB300 NVL72 is 446% cheaper per token; GB300 NVL72 delivers 520% more tok/s/GPU.
Setting 136 tok/s/user as the target on DeepSeek V4 Pro 1.6T, B300 produces 554 tok/s/GPU ($1.17 per million tokens) and GB300 NVL72 produces 3092 ($0.24). GB300 NVL72 is 389% cheaper per token; GB300 NVL72 delivers 458% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:1840.7GB300 NVL72:11018.4 | B300:1018.6GB300 NVL72:6319.6 | B300:553.7GB300 NVL72:3091.9 |
| Cost ($/M tok) | B300:$0.353GB300 NVL72:$0.067 | B300:$0.640GB300 NVL72:$0.117 | B300:$1.170GB300 NVL72:$0.239 |
| tok/s/MW | B300:848244GB300 NVL72:5246871 | B300:469421GB300 NVL72:3009317 | B300:255156GB300 NVL72:1472349 |
| Concurrency | B300:~17GB300 NVL72:~2440 | B300:~5GB300 NVL72:~353 | B300:~2GB300 NVL72:~256 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.