MiniMax M2.5/M2.7 · GPU comparison

MiniMax M2.5/M2.7 — H100 vs H200

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and H200 (NVIDIA Hopper) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 59 tok/s/user interactivity on MiniMax M2.5/M2.7, H100 delivers 614 tok/s/GPU at $0.58 per million tokens; H200 delivers 947 tok/s/GPU at $0.41. H200 is 41% cheaper per token; H200 delivers 54% more tok/s/GPU at this point.

H100 posts 378 tok/s/GPU for $0.94 per million tokens at 78 tok/s/user on MiniMax M2.5/M2.7; H200 posts 482 tok/s/GPU for $0.82. H200 is 15% cheaper per token; H200 delivers 27% more tok/s/GPU.

Throughput at 98 tok/s/user on MiniMax M2.5/M2.7: H100 hits 199 tok/s/GPU, H200 hits 375. Per-million costs land at $1.82 and $1.05 respectively. H200 is 75% cheaper per token; H200 delivers 88% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:614.1H200:947.3
H100:378.2H200:481.8
H100:199.2H200:375.3
Cost ($/M tok)
H100:$0.580H200:$0.412
H100:$0.937H200:$0.816
H100:$1.824H200:$1.045
tok/s/MW
H100:354968H200:547585
H100:218629H200:278486
H100:115163H200:216947
Concurrency
H100:~42H200:~64
H100:~12H200:~21
H100:~8H200:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.