MiniMax M2.5/M2.7 — GB200 NVL72 vs GB300 NVL72
Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 68 tok/s/user interactivity on MiniMax M2.5/M2.7, GB200 NVL72 delivers 8433 tok/s/GPU at $0.07 per million tokens; GB300 NVL72 delivers 8539 tok/s/GPU at $0.08. GB200 NVL72 is 13% cheaper per token; GB300 NVL72 delivers 1% more tok/s/GPU at this point.
GB200 NVL72 posts 3247 tok/s/GPU for $0.19 per million tokens at 105 tok/s/user on MiniMax M2.5/M2.7; GB300 NVL72 posts 3391 tok/s/GPU for $0.22. GB200 NVL72 is 14% cheaper per token; GB300 NVL72 delivers 4% more tok/s/GPU.
Throughput at 143 tok/s/user on MiniMax M2.5/M2.7: GB200 NVL72 hits 1066 tok/s/GPU, GB300 NVL72 hits 1047. Per-million costs land at $0.58 and $0.70 respectively. GB200 NVL72 is 21% cheaper per token; GB200 NVL72 delivers 2% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | GB200 NVL72:8432.8GB300 NVL72:8538.7 | GB200 NVL72:3246.9GB300 NVL72:3391.2 | GB200 NVL72:1066.4GB300 NVL72:1047.4 |
| Cost ($/M tok) | GB200 NVL72:$0.073GB300 NVL72:$0.082 | GB200 NVL72:$0.190GB300 NVL72:$0.217 | GB200 NVL72:$0.581GB300 NVL72:$0.702 |
| tok/s/MW | GB200 NVL72:4015616GB300 NVL72:4066071 | GB200 NVL72:1546138GB300 NVL72:1614855 | GB200 NVL72:507786GB300 NVL72:498777 |
| Concurrency | GB200 NVL72:~928GB300 NVL72:~1024 | GB200 NVL72:~236GB300 NVL72:~211 | GB200 NVL72:~45GB300 NVL72:~34 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.