MiniMax M2.5/M2.7 · GPU comparison

MiniMax M2.5/M2.7 — GB200 NVL72 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 68 tok/s/user interactivity on MiniMax M2.5/M2.7, GB200 NVL72 delivers 8433 tok/s/GPU at $0.07 per million tokens; GB300 NVL72 delivers 8539 tok/s/GPU at $0.08. GB200 NVL72 is 13% cheaper per token; GB300 NVL72 delivers 1% more tok/s/GPU at this point.

GB200 NVL72 posts 3247 tok/s/GPU for $0.19 per million tokens at 105 tok/s/user on MiniMax M2.5/M2.7; GB300 NVL72 posts 3391 tok/s/GPU for $0.22. GB200 NVL72 is 14% cheaper per token; GB300 NVL72 delivers 4% more tok/s/GPU.

Throughput at 143 tok/s/user on MiniMax M2.5/M2.7: GB200 NVL72 hits 1066 tok/s/GPU, GB300 NVL72 hits 1047. Per-million costs land at $0.58 and $0.70 respectively. GB200 NVL72 is 21% cheaper per token; GB200 NVL72 delivers 2% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB200 NVL72:8432.8GB300 NVL72:8538.7
GB200 NVL72:3246.9GB300 NVL72:3391.2
GB200 NVL72:1066.4GB300 NVL72:1047.4
Cost ($/M tok)
GB200 NVL72:$0.073GB300 NVL72:$0.082
GB200 NVL72:$0.190GB300 NVL72:$0.217
GB200 NVL72:$0.581GB300 NVL72:$0.702
tok/s/MW
GB200 NVL72:4015616GB300 NVL72:4066071
GB200 NVL72:1546138GB300 NVL72:1614855
GB200 NVL72:507786GB300 NVL72:498777
Concurrency
GB200 NVL72:~928GB300 NVL72:~1024
GB200 NVL72:~236GB300 NVL72:~211
GB200 NVL72:~45GB300 NVL72:~34

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.