MiniMax M3 428B · GPU comparison

MiniMax M3 428B — MI300X vs MI355X

Head-to-head AI inference benchmark comparison of MI300X (AMD CDNA 3) and MI355X (AMD CDNA 4) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 21 tok/s/user interactivity on MiniMax M3 428B, MI300X delivers 900 tok/s/GPU at $0.34 per million tokens; MI355X delivers 2159 tok/s/GPU at $0.19. MI355X is 80% cheaper per token; MI355X delivers 140% more tok/s/GPU at this point.

MI300X posts 763 tok/s/GPU for $0.41 per million tokens at 29 tok/s/user on MiniMax M3 428B; MI355X posts 1859 tok/s/GPU for $0.22. MI355X is 88% cheaper per token; MI355X delivers 144% more tok/s/GPU.

Throughput at 38 tok/s/user on MiniMax M3 428B: MI300X hits 556 tok/s/GPU, MI355X hits 1456. Per-million costs land at $0.56 and $0.29 respectively. MI355X is 91% cheaper per token; MI355X delivers 162% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
MI300X:900.1MI355X:2158.8
MI300X:762.9MI355X:1859.2
MI300X:555.5MI355X:1456.2
Cost ($/M tok)
MI300X:$0.342MI355X:$0.190
MI300X:$0.411MI355X:$0.219
MI300X:$0.560MI355X:$0.293
tok/s/MW
MI300X:502871MI355X:814640
MI300X:426206MI355X:701596
MI300X:310337MI355X:549495
Concurrency
MI300X:~196MI355X:~226
MI300X:~113MI355X:~143
MI300X:~65MI355X:~92

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.