MiniMax M3 428B — MI300X vs MI355X
Head-to-head AI inference benchmark comparison of MI300X (AMD CDNA 3) and MI355X (AMD CDNA 4) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 21 tok/s/user interactivity on MiniMax M3 428B, MI300X delivers 900 tok/s/GPU at $0.34 per million tokens; MI355X delivers 2159 tok/s/GPU at $0.19. MI355X is 80% cheaper per token; MI355X delivers 140% more tok/s/GPU at this point.
MI300X posts 763 tok/s/GPU for $0.41 per million tokens at 29 tok/s/user on MiniMax M3 428B; MI355X posts 1859 tok/s/GPU for $0.22. MI355X is 88% cheaper per token; MI355X delivers 144% more tok/s/GPU.
Throughput at 38 tok/s/user on MiniMax M3 428B: MI300X hits 556 tok/s/GPU, MI355X hits 1456. Per-million costs land at $0.56 and $0.29 respectively. MI355X is 91% cheaper per token; MI355X delivers 162% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | MI300X:900.1MI355X:2158.8 | MI300X:762.9MI355X:1859.2 | MI300X:555.5MI355X:1456.2 |
| Cost ($/M tok) | MI300X:$0.342MI355X:$0.190 | MI300X:$0.411MI355X:$0.219 | MI300X:$0.560MI355X:$0.293 |
| tok/s/MW | MI300X:502871MI355X:814640 | MI300X:426206MI355X:701596 | MI300X:310337MI355X:549495 |
| Concurrency | MI300X:~196MI355X:~226 | MI300X:~113MI355X:~143 | MI300X:~65MI355X:~92 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.