MiniMax M3 428B · GPU comparison

MiniMax M3 428B — GB300 NVL72 vs MI325X

Head-to-head AI inference benchmark comparison of GB300 NVL72 (NVIDIA Blackwell) and MI325X (AMD CDNA 3) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

GB300 NVL72 posts 1674 tok/s/GPU for $0.40 per million tokens at 41 tok/s/user on MiniMax M3 428B; MI325X posts 747 tok/s/GPU for $0.48. GB300 NVL72 is 18% cheaper per token; GB300 NVL72 delivers 124% more tok/s/GPU.

Throughput at 75 tok/s/user on MiniMax M3 428B: GB300 NVL72 hits 466 tok/s/GPU, MI325X hits 302. Per-million costs land at $1.55 and $1.19 respectively. MI325X is 31% cheaper per token; GB300 NVL72 delivers 54% more tok/s/GPU.

GB300 NVL72 / MI325X on MiniMax M3 428B at 109 tok/s/user: 185 / 169 tok/s/GPU, $3.79 / $2.08 per million tokens. MI325X is 82% cheaper per token; GB300 NVL72 delivers 10% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB300 NVL72:1674.2MI325X:747.0
GB300 NVL72:465.7MI325X:301.9
GB300 NVL72:185.3MI325X:169.2
Cost ($/M tok)
GB300 NVL72:$0.404MI325X:$0.476
GB300 NVL72:$1.552MI325X:$1.186
GB300 NVL72:$3.791MI325X:$2.079
tok/s/MW
GB300 NVL72:797242MI325X:342676
GB300 NVL72:221750MI325X:138498
GB300 NVL72:88227MI325X:77612
Concurrency
GB300 NVL72:~243MI325X:~80
GB300 NVL72:~22MI325X:~18
GB300 NVL72:~5MI325X:~6

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.