MiniMax M2.5/M2.7 · GPU comparison

MiniMax M2.5/M2.7 — GB300 NVL72 vs H100

Head-to-head AI inference benchmark comparison of GB300 NVL72 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

GB300 NVL72 posts 3985 tok/s/GPU for $0.19 per million tokens at 59 tok/s/user on MiniMax M2.5/M2.7; H100 posts 614 tok/s/GPU for $0.58. GB300 NVL72 is 210% cheaper per token; GB300 NVL72 delivers 549% more tok/s/GPU.

Throughput at 78 tok/s/user on MiniMax M2.5/M2.7: GB300 NVL72 hits 1610 tok/s/GPU, H100 hits 378. Per-million costs land at $0.48 and $0.94 respectively. GB300 NVL72 is 96% cheaper per token; GB300 NVL72 delivers 326% more tok/s/GPU.

GB300 NVL72 / H100 on MiniMax M2.5/M2.7 at 98 tok/s/user: 901 / 199 tok/s/GPU, $0.84 / $1.82 per million tokens. GB300 NVL72 is 116% cheaper per token; GB300 NVL72 delivers 352% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	GB300 NVL72:3984.8H100:614.1	GB300 NVL72:1610.0H100:378.2	GB300 NVL72:901.3H100:199.2
Cost ($/M tok)	GB300 NVL72:$0.187H100:$0.580	GB300 NVL72:$0.479H100:$0.937	GB300 NVL72:$0.845H100:$1.824
tok/s/MW	GB300 NVL72:1879619H100:448245	GB300 NVL72:759435H100:276080	GB300 NVL72:425139H100:145425
Concurrency	GB300 NVL72:~405H100:~42	GB300 NVL72:~64H100:~12	GB300 NVL72:~32H100:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: