MiniMax M3 428B — GB300 NVL72 vs MI325X
Head-to-head AI inference benchmark comparison of GB300 NVL72 (NVIDIA Blackwell) and MI325X (AMD CDNA 3) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
GB300 NVL72 posts 1674 tok/s/GPU for $0.40 per million tokens at 41 tok/s/user on MiniMax M3 428B; MI325X posts 747 tok/s/GPU for $0.48. GB300 NVL72 is 18% cheaper per token; GB300 NVL72 delivers 124% more tok/s/GPU.
Throughput at 75 tok/s/user on MiniMax M3 428B: GB300 NVL72 hits 466 tok/s/GPU, MI325X hits 302. Per-million costs land at $1.55 and $1.19 respectively. MI325X is 31% cheaper per token; GB300 NVL72 delivers 54% more tok/s/GPU.
GB300 NVL72 / MI325X on MiniMax M3 428B at 109 tok/s/user: 185 / 169 tok/s/GPU, $3.79 / $2.08 per million tokens. MI325X is 82% cheaper per token; GB300 NVL72 delivers 10% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | GB300 NVL72:1674.2MI325X:747.0 | GB300 NVL72:465.7MI325X:301.9 | GB300 NVL72:185.3MI325X:169.2 |
| Cost ($/M tok) | GB300 NVL72:$0.404MI325X:$0.476 | GB300 NVL72:$1.552MI325X:$1.186 | GB300 NVL72:$3.791MI325X:$2.079 |
| tok/s/MW | GB300 NVL72:797242MI325X:342676 | GB300 NVL72:221750MI325X:138498 | GB300 NVL72:88227MI325X:77612 |
| Concurrency | GB300 NVL72:~243MI325X:~80 | GB300 NVL72:~22MI325X:~18 | GB300 NVL72:~5MI325X:~6 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.