MiniMax M3 428B — GB200 NVL72 vs MI355X
Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and MI355X (AMD CDNA 4) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Setting 52 tok/s/user as the target on MiniMax M3 428B, GB200 NVL72 produces 1598 tok/s/GPU ($0.38 per million tokens) and MI355X produces 2784 ($0.15). MI355X is 157% cheaper per token; MI355X delivers 74% more tok/s/GPU.
At 87 tok/s/user interactivity on MiniMax M3 428B, GB200 NVL72 delivers 638 tok/s/GPU at $1.00 per million tokens; MI355X delivers 1636 tok/s/GPU at $0.25. MI355X is 296% cheaper per token; MI355X delivers 156% more tok/s/GPU at this point.
GB200 NVL72 posts 211 tok/s/GPU for $2.86 per million tokens at 122 tok/s/user on MiniMax M3 428B; MI355X posts 1048 tok/s/GPU for $0.39. MI355X is 633% cheaper per token; MI355X delivers 397% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | GB200 NVL72:1598.2MI355X:2783.6 | GB200 NVL72:638.3MI355X:1636.2 | GB200 NVL72:210.9MI355X:1048.2 |
| Cost ($/M tok) | GB200 NVL72:$0.382MI355X:$0.149 | GB200 NVL72:$0.996MI355X:$0.251 | GB200 NVL72:$2.862MI355X:$0.391 |
| tok/s/MW | GB200 NVL72:761024MI355X:1050417 | GB200 NVL72:303973MI355X:617433 | GB200 NVL72:100447MI355X:395557 |
| Concurrency | GB200 NVL72:~180MI355X:~112 | GB200 NVL72:~74MI355X:~41 | GB200 NVL72:~18MI355X:~19 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.