DeepSeek R1 — GB300 NVL72 vs MI355X
Head-to-head AI inference benchmark comparison of GB300 NVL72 (NVIDIA Blackwell) and MI355X (AMD CDNA 4) on DeepSeek R1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Throughput at 72 tok/s/user on DeepSeek R1: GB300 NVL72 hits 11871 tok/s/GPU, MI355X hits 1905. Per-million costs land at $0.06 and $0.22 respectively. GB300 NVL72 is 253% cheaper per token; GB300 NVL72 delivers 523% more tok/s/GPU.
GB300 NVL72 / MI355X on DeepSeek R1 at 129 tok/s/user: 4691 / 511 tok/s/GPU, $0.16 / $0.82 per million tokens. GB300 NVL72 is 422% cheaper per token; GB300 NVL72 delivers 818% more tok/s/GPU.
Toward the upper edge of the 15–242 tok/s/user interactivity band, at 186 tok/s/user on DeepSeek R1: GB300 NVL72 runs 895 tok/s/GPU at $0.87/M tokens, MI355X runs 187 at $2.21/M. GB300 NVL72 is 154% cheaper per token; GB300 NVL72 delivers 380% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | GB300 NVL72:11871.2MI355X:1904.5 | GB300 NVL72:4691.3MI355X:511.0 | GB300 NVL72:895.2MI355X:186.6 |
| Cost ($/M tok) | GB300 NVL72:$0.062MI355X:$0.219 | GB300 NVL72:$0.157MI355X:$0.822 | GB300 NVL72:$0.867MI355X:$2.205 |
| tok/s/MW | GB300 NVL72:5652964MI355X:718693 | GB300 NVL72:2233962MI355X:192821 | GB300 NVL72:426299MI355X:70406 |
| Concurrency | GB300 NVL72:~2185MI355X:~249 | GB300 NVL72:~932MI355X:~61 | GB300 NVL72:~127MI355X:~4 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.