DeepSeek V4 Pro 1.6T — B300 vs GB200 NVL72
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Near the low end of the 6–152 tok/s/user interactivity band, at 43 tok/s/user on DeepSeek V4 Pro 1.6T: B300 runs 2131 tok/s/GPU at $0.30/M tokens, GB200 NVL72 runs 2401 at $0.26/M. GB200 NVL72 is 19% cheaper per token; GB200 NVL72 delivers 13% more tok/s/GPU.
Setting 79 tok/s/user as the target on DeepSeek V4 Pro 1.6T, B300 produces 1272 tok/s/GPU ($0.51 per million tokens) and GB200 NVL72 produces 583 ($1.05). B300 is 106% cheaper per token; B300 delivers 118% more tok/s/GPU.
At 116 tok/s/user interactivity on DeepSeek V4 Pro 1.6T, B300 delivers 656 tok/s/GPU at $1.00 per million tokens; GB200 NVL72 delivers 195 tok/s/GPU at $3.37. B300 is 237% cheaper per token; B300 delivers 236% more tok/s/GPU at this point. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:2130.6GB200 NVL72:2401.3 | B300:1272.0GB200 NVL72:582.9 | B300:655.8GB200 NVL72:195.4 |
| Cost ($/M tok) | B300:$0.305GB200 NVL72:$0.257 | B300:$0.511GB200 NVL72:$1.051 | B300:$0.999GB200 NVL72:$3.368 |
| tok/s/MW | B300:981833GB200 NVL72:1143485 | B300:586191GB200 NVL72:277571 | B300:302216GB200 NVL72:93050 |
| Concurrency | B300:~28GB200 NVL72:~125 | B300:~8GB200 NVL72:~47 | B300:~4GB200 NVL72:~7 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.