MiniMax M3 428B — H200 vs MI325X
Head-to-head AI inference benchmark comparison of H200 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Near the low end of the 8–182 tok/s/user interactivity band, at 51 tok/s/user on MiniMax M3 428B: H200 runs 1034 tok/s/GPU at $0.38/M tokens, MI325X runs 572 at $0.62/M. H200 is 62% cheaper per token; H200 delivers 81% more tok/s/GPU.
Setting 95 tok/s/user as the target on MiniMax M3 428B, H200 produces 575 tok/s/GPU ($0.69 per million tokens) and MI325X produces 209 ($1.70). H200 is 147% cheaper per token; H200 delivers 176% more tok/s/GPU.
At 138 tok/s/user interactivity on MiniMax M3 428B, H200 delivers 360 tok/s/GPU at $1.06 per million tokens; MI325X delivers 116 tok/s/GPU at $3.06. H200 is 188% cheaper per token; H200 delivers 209% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | H200:1033.8MI325X:572.3 | H200:574.9MI325X:208.6 | H200:359.6MI325X:116.2 |
| Cost ($/M tok) | H200:$0.384MI325X:$0.623 | H200:$0.687MI325X:$1.697 | H200:$1.062MI325X:$3.056 |
| tok/s/MW | H200:597555MI325X:262510 | H200:332328MI325X:95683 | H200:207872MI325X:53312 |
| Concurrency | H200:~44MI325X:~49 | H200:~13MI325X:~9 | H200:~5MI325X:~4 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.