MiniMax M3 428B · GPU comparison

MiniMax M3 428B — H100 vs MI355X

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and MI355X (AMD CDNA 4) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Throughput at 23 tok/s/user on MiniMax M3 428B: H100 hits 1027 tok/s/GPU, MI355X hits 2075. Per-million costs land at $0.35 and $0.20 respectively. MI355X is 79% cheaper per token; MI355X delivers 102% more tok/s/GPU.

H100 / MI355X on MiniMax M3 428B at 31 tok/s/user: 1027 / 1769 tok/s/GPU, $0.35 / $0.23 per million tokens. MI355X is 51% cheaper per token; MI355X delivers 72% more tok/s/GPU.

Toward the upper edge of the 15–46 tok/s/user interactivity band, at 38 tok/s/user on MiniMax M3 428B: H100 runs 994 tok/s/GPU at $0.36/M tokens, MI355X runs 1456 at $0.29/M. MI355X is 24% cheaper per token; MI355X delivers 47% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:1027.4MI355X:2074.5
H100:1027.4MI355X:1769.3
H100:993.9MI355X:1456.2
Cost ($/M tok)
H100:$0.351MI355X:$0.196
H100:$0.351MI355X:$0.232
H100:$0.363MI355X:$0.293
tok/s/MW
H100:593881MI355X:782835
H100:593881MI355X:667673
H100:574513MI355X:549495
Concurrency
H100:~128MI355X:~196
H100:~128MI355X:~127
H100:~117MI355X:~92

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.