DeepSeek V4 Pro 1.6T — GB200 NVL72 vs MI355X
Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and MI355X (AMD CDNA 4) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Throughput at 19 tok/s/user on DeepSeek V4 Pro 1.6T: GB200 NVL72 hits 8006 tok/s/GPU, MI355X hits 901. Per-million costs land at $0.08 and $0.46 respectively. GB200 NVL72 is 497% cheaper per token; GB200 NVL72 delivers 788% more tok/s/GPU.
GB200 NVL72 / MI355X on DeepSeek V4 Pro 1.6T at 31 tok/s/user: 4419 / 619 tok/s/GPU, $0.14 / $0.65 per million tokens. GB200 NVL72 is 377% cheaper per token; GB200 NVL72 delivers 614% more tok/s/GPU.
Toward the upper edge of the 6–57 tok/s/user interactivity band, at 44 tok/s/user on DeepSeek V4 Pro 1.6T: GB200 NVL72 runs 2304 tok/s/GPU at $0.27/M tokens, MI355X runs 292 at $1.40/M. GB200 NVL72 is 421% cheaper per token; GB200 NVL72 delivers 690% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | GB200 NVL72:8005.5MI355X:901.2 | GB200 NVL72:4418.7MI355X:618.9 | GB200 NVL72:2303.6MI355X:291.6 |
| Cost ($/M tok) | GB200 NVL72:$0.077MI355X:$0.458 | GB200 NVL72:$0.137MI355X:$0.655 | GB200 NVL72:$0.269MI355X:$1.398 |
| tok/s/MW | GB200 NVL72:3812149MI355X:340079 | GB200 NVL72:2104152MI355X:233552 | GB200 NVL72:1096940MI355X:110051 |
| Concurrency | GB200 NVL72:~4037MI355X:~57 | GB200 NVL72:~310MI355X:~20 | GB200 NVL72:~122MI355X:~6 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.