DeepSeek V4 Pro 1.6T · GPU comparison

DeepSeek V4 Pro 1.6T — GB200 NVL72 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 48 tok/s/user interactivity on DeepSeek V4 Pro 1.6T, GB200 NVL72 delivers 1962 tok/s/GPU at $0.31 per million tokens; GB300 NVL72 delivers 7790 tok/s/GPU at $0.09. GB300 NVL72 is 233% cheaper per token; GB300 NVL72 delivers 297% more tok/s/GPU at this point.

GB200 NVL72 posts 435 tok/s/GPU for $1.40 per million tokens at 83 tok/s/user on DeepSeek V4 Pro 1.6T; GB300 NVL72 posts 2830 tok/s/GPU for $0.26. GB300 NVL72 is 442% cheaper per token; GB300 NVL72 delivers 551% more tok/s/GPU.

Throughput at 118 tok/s/user on DeepSeek V4 Pro 1.6T: GB200 NVL72 hits 192 tok/s/GPU, GB300 NVL72 hits 1160. Per-million costs land at $3.43 and $0.64 respectively. GB300 NVL72 is 439% cheaper per token; GB300 NVL72 delivers 505% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB200 NVL72:1962.3GB300 NVL72:7789.8
GB200 NVL72:434.9GB300 NVL72:2829.8
GB200 NVL72:191.7GB300 NVL72:1159.9
Cost ($/M tok)
GB200 NVL72:$0.314GB300 NVL72:$0.094
GB200 NVL72:$1.401GB300 NVL72:$0.258
GB200 NVL72:$3.428GB300 NVL72:$0.636
tok/s/MW
GB200 NVL72:934435GB300 NVL72:3709451
GB200 NVL72:207086GB300 NVL72:1347547
GB200 NVL72:91263GB300 NVL72:552333
Concurrency
GB200 NVL72:~111GB300 NVL72:~1171
GB200 NVL72:~29GB300 NVL72:~256
GB200 NVL72:~7GB300 NVL72:~58

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.