MiniMax M2.5/M2.7 · GPU comparison

MiniMax M2.5/M2.7 — B200 vs MI300X

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and MI300X (AMD CDNA 3) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Throughput at 44 tok/s/user on MiniMax M2.5/M2.7: B200 hits 5409 tok/s/GPU, MI300X hits 1249. Per-million costs land at $0.10 and $0.25 respectively. B200 is 150% cheaper per token; B200 delivers 333% more tok/s/GPU.

B200 / MI300X on MiniMax M2.5/M2.7 at 61 tok/s/user: 3136 / 779 tok/s/GPU, $0.17 / $0.40 per million tokens. B200 is 132% cheaper per token; B200 delivers 303% more tok/s/GPU.

Toward the upper edge of the 27–95 tok/s/user interactivity band, at 78 tok/s/user on MiniMax M2.5/M2.7: B200 runs 1729 tok/s/GPU at $0.32/M tokens, MI300X runs 412 at $0.75/M. B200 is 136% cheaper per token; B200 delivers 320% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:5409.4MI300X:1248.6
B200:3135.7MI300X:778.8
B200:1729.2MI300X:411.9
Cost ($/M tok)
B200:$0.100MI300X:$0.250
B200:$0.173MI300X:$0.402
B200:$0.317MI300X:$0.747
tok/s/MW
B200:2492822MI300X:697553
B200:1445026MI300X:435091
B200:796848MI300X:230105
Concurrency
B200:~254MI300X:~63
B200:~105MI300X:~26
B200:~45MI300X:~11

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.