MiniMax M3 428B · GPU comparison

MiniMax M3 428B — H200 vs MI325X

Head-to-head AI inference benchmark comparison of H200 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Near the low end of the 8–182 tok/s/user interactivity band, at 51 tok/s/user on MiniMax M3 428B: H200 runs 1034 tok/s/GPU at $0.38/M tokens, MI325X runs 572 at $0.62/M. H200 is 62% cheaper per token; H200 delivers 81% more tok/s/GPU.

Setting 95 tok/s/user as the target on MiniMax M3 428B, H200 produces 575 tok/s/GPU ($0.69 per million tokens) and MI325X produces 209 ($1.70). H200 is 147% cheaper per token; H200 delivers 176% more tok/s/GPU.

At 138 tok/s/user interactivity on MiniMax M3 428B, H200 delivers 360 tok/s/GPU at $1.06 per million tokens; MI325X delivers 116 tok/s/GPU at $3.06. H200 is 188% cheaper per token; H200 delivers 209% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	H200:1033.8MI325X:572.3	H200:574.9MI325X:208.6	H200:359.6MI325X:116.2
Cost ($/M tok)	H200:$0.384MI325X:$0.623	H200:$0.687MI325X:$1.697	H200:$1.062MI325X:$3.056
tok/s/MW	H200:754576MI325X:338622	H200:419656MI325X:123426	H200:262495MI325X:68769
Concurrency	H200:~44MI325X:~49	H200:~13MI325X:~9	H200:~5MI325X:~4

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Deployment:

Spec Decoding: