DeepSeek R1 · GPU comparison

DeepSeek R1 — B300 vs GB200 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell) on DeepSeek R1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 / GB200 NVL72 on DeepSeek R1 at 90 tok/s/user: 8490 / 11719 tok/s/GPU, $0.08 / $0.05 per million tokens. GB200 NVL72 is 45% cheaper per token; GB200 NVL72 delivers 38% more tok/s/GPU.

Around the middle of the 21–299 tok/s/user interactivity band, at 160 tok/s/user on DeepSeek R1: B300 runs 1837 tok/s/GPU at $0.35/M tokens, GB200 NVL72 runs 4398 at $0.14/M. GB200 NVL72 is 156% cheaper per token; GB200 NVL72 delivers 139% more tok/s/GPU.

Setting 230 tok/s/user as the target on DeepSeek R1, B300 produces 1064 tok/s/GPU ($0.60 per million tokens) and GB200 NVL72 produces 1029 ($0.60). Cost per token is essentially tied; B300 delivers 3% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B300:8490.5GB200 NVL72:11719.4
B300:1836.6GB200 NVL72:4397.9
B300:1064.2GB200 NVL72:1028.9
Cost ($/M tok)
B300:$0.076GB200 NVL72:$0.052
B300:$0.354GB200 NVL72:$0.139
B300:$0.604GB200 NVL72:$0.601
tok/s/MW
B300:3912654GB200 NVL72:5580669
B300:846363GB200 NVL72:2094256
B300:490420GB200 NVL72:489935
Concurrency
B300:~214GB200 NVL72:~935
B300:~46GB200 NVL72:~194
B300:~34GB200 NVL72:~19

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.