MiniMax M3 428B · GPU comparison

MiniMax M3 428B — MI325X vs MI355X

Head-to-head AI inference benchmark comparison of MI325X (AMD CDNA 3) and MI355X (AMD CDNA 4) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

MI325X posts 557 tok/s/GPU for $0.64 per million tokens at 52 tok/s/user on MiniMax M3 428B; MI355X posts 2671 tok/s/GPU for $0.15. MI355X is 317% cheaper per token; MI355X delivers 380% more tok/s/GPU.

Throughput at 96 tok/s/user on MiniMax M3 428B: MI325X hits 206 tok/s/GPU, MI355X hits 1486. Per-million costs land at $1.72 and $0.28 respectively. MI355X is 523% cheaper per token; MI355X delivers 622% more tok/s/GPU.

MI325X / MI355X on MiniMax M3 428B at 139 tok/s/user: 115 / 867 tok/s/GPU, $3.09 / $0.47 per million tokens. MI355X is 553% cheaper per token; MI355X delivers 656% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	MI325X:556.9MI355X:2670.6	MI325X:205.8MI355X:1485.8	MI325X:114.8MI355X:867.4
Cost ($/M tok)	MI325X:$0.640MI355X:$0.154	MI325X:$1.724MI355X:$0.277	MI325X:$3.092MI355X:$0.474
tok/s/MW	MI325X:329519MI355X:1277809	MI325X:121762MI355X:710932	MI325X:67900MI355X:415015
Concurrency	MI325X:~47MI355X:~108	MI325X:~9MI355X:~34	MI325X:~4MI355X:~13

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Deployment:

Spec Decoding: