DeepSeek V4 Pro 1.6T · GPU comparison

DeepSeek V4 Pro 1.6T — GB200 NVL72 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 62 tok/s/user interactivity on DeepSeek V4 Pro 1.6T, GB200 NVL72 delivers 9247 tok/s/GPU at $0.07 per million tokens; GB300 NVL72 delivers 9940 tok/s/GPU at $0.07. GB200 NVL72 is 12% cheaper per token; GB300 NVL72 delivers 7% more tok/s/GPU at this point.

GB200 NVL72 posts 6270 tok/s/GPU for $0.10 per million tokens at 109 tok/s/user on DeepSeek V4 Pro 1.6T; GB300 NVL72 posts 6207 tok/s/GPU for $0.12. GB200 NVL72 is 22% cheaper per token; GB200 NVL72 delivers 1% more tok/s/GPU.

Throughput at 157 tok/s/user on DeepSeek V4 Pro 1.6T: GB200 NVL72 hits 523 tok/s/GPU, GB300 NVL72 hits 965. Per-million costs land at $1.18 and $0.76 respectively. GB300 NVL72 is 54% cheaper per token; GB300 NVL72 delivers 85% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	GB200 NVL72:9246.6GB300 NVL72:9939.9	GB200 NVL72:6270.4GB300 NVL72:6207.4	GB200 NVL72:522.7GB300 NVL72:964.6
Cost ($/M tok)	GB200 NVL72:$0.066GB300 NVL72:$0.074	GB200 NVL72:$0.098GB300 NVL72:$0.119	GB200 NVL72:$1.176GB300 NVL72:$0.764
tok/s/MW	GB200 NVL72:4403149GB300 NVL72:4733294	GB200 NVL72:2985888GB300 NVL72:2955890	GB200 NVL72:248888GB300 NVL72:459337
Concurrency	GB200 NVL72:~12312GB300 NVL72:~1043	GB200 NVL72:~1934GB300 NVL72:~613	GB200 NVL72:~41GB300 NVL72:~31

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: