MiniMax M2.5/M2.7 · GPU comparison

MiniMax M2.5/M2.7 — H100 vs H200

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and H200 (NVIDIA Hopper) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 59 tok/s/user interactivity on MiniMax M2.5/M2.7, H100 delivers 628 tok/s/GPU at $0.57 per million tokens; H200 delivers 948 tok/s/GPU at $0.41. H200 is 39% cheaper per token; H200 delivers 51% more tok/s/GPU at this point.

H100 posts 376 tok/s/GPU for $0.94 per million tokens at 78 tok/s/user on MiniMax M2.5/M2.7; H200 posts 438 tok/s/GPU for $0.86. H200 is 9% cheaper per token; H200 delivers 17% more tok/s/GPU.

Throughput at 97 tok/s/user on MiniMax M2.5/M2.7: H100 hits 208 tok/s/GPU, H200 hits 277. Per-million costs land at $1.75 and $1.41 respectively. H200 is 24% cheaper per token; H200 delivers 33% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:628.0H200:947.9
H100:375.6H200:438.3
H100:207.9H200:277.3
Cost ($/M tok)
H100:$0.572H200:$0.412
H100:$0.939H200:$0.863
H100:$1.754H200:$1.413
tok/s/MW
H100:363025H200:547943
H100:217088H200:253368
H100:120199H200:160316
Concurrency
H100:~43H200:~66
H100:~12H200:~23
H100:~8H200:~12

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.