MiniMax M2.5/M2.7 — B200 vs MI300X
Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and MI300X (AMD CDNA 3) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Throughput at 44 tok/s/user on MiniMax M2.5/M2.7: B200 hits 5409 tok/s/GPU, MI300X hits 1249. Per-million costs land at $0.10 and $0.25 respectively. B200 is 150% cheaper per token; B200 delivers 333% more tok/s/GPU.
B200 / MI300X on MiniMax M2.5/M2.7 at 61 tok/s/user: 3136 / 779 tok/s/GPU, $0.17 / $0.40 per million tokens. B200 is 132% cheaper per token; B200 delivers 303% more tok/s/GPU.
Toward the upper edge of the 27–95 tok/s/user interactivity band, at 78 tok/s/user on MiniMax M2.5/M2.7: B200 runs 1729 tok/s/GPU at $0.32/M tokens, MI300X runs 412 at $0.75/M. B200 is 136% cheaper per token; B200 delivers 320% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B200:5409.4MI300X:1248.6 | B200:3135.7MI300X:778.8 | B200:1729.2MI300X:411.9 |
| Cost ($/M tok) | B200:$0.100MI300X:$0.250 | B200:$0.173MI300X:$0.402 | B200:$0.317MI300X:$0.747 |
| tok/s/MW | B200:2492822MI300X:697553 | B200:1445026MI300X:435091 | B200:796848MI300X:230105 |
| Concurrency | B200:~254MI300X:~63 | B200:~105MI300X:~26 | B200:~45MI300X:~11 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.