DeepSeek R1 — B300 vs GB200 NVL72
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell) on DeepSeek R1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
B300 / GB200 NVL72 on DeepSeek R1 at 90 tok/s/user: 8490 / 11719 tok/s/GPU, $0.08 / $0.05 per million tokens. GB200 NVL72 is 45% cheaper per token; GB200 NVL72 delivers 38% more tok/s/GPU.
Around the middle of the 21–299 tok/s/user interactivity band, at 160 tok/s/user on DeepSeek R1: B300 runs 1837 tok/s/GPU at $0.35/M tokens, GB200 NVL72 runs 4398 at $0.14/M. GB200 NVL72 is 156% cheaper per token; GB200 NVL72 delivers 139% more tok/s/GPU.
Setting 230 tok/s/user as the target on DeepSeek R1, B300 produces 1064 tok/s/GPU ($0.60 per million tokens) and GB200 NVL72 produces 1029 ($0.60). Cost per token is essentially tied; B300 delivers 3% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:8490.5GB200 NVL72:11719.4 | B300:1836.6GB200 NVL72:4397.9 | B300:1064.2GB200 NVL72:1028.9 |
| Cost ($/M tok) | B300:$0.076GB200 NVL72:$0.052 | B300:$0.354GB200 NVL72:$0.139 | B300:$0.604GB200 NVL72:$0.601 |
| tok/s/MW | B300:3912654GB200 NVL72:5580669 | B300:846363GB200 NVL72:2094256 | B300:490420GB200 NVL72:489935 |
| Concurrency | B300:~214GB200 NVL72:~935 | B300:~46GB200 NVL72:~194 | B300:~34GB200 NVL72:~19 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.