MiniMax M2.5/M2.7 · GPU comparison

MiniMax M2.5/M2.7 — B200 vs H100

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on MiniMax M2.5/M2.7. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 59 tok/s/user interactivity on MiniMax M2.5/M2.7, B200 delivers 3357 tok/s/GPU at $0.16 per million tokens; H100 delivers 628 tok/s/GPU at $0.57. B200 is 254% cheaper per token; B200 delivers 435% more tok/s/GPU at this point.

B200 posts 1729 tok/s/GPU for $0.32 per million tokens at 78 tok/s/user on MiniMax M2.5/M2.7; H100 posts 376 tok/s/GPU for $0.94. B200 is 196% cheaper per token; B200 delivers 360% more tok/s/GPU.

Throughput at 97 tok/s/user on MiniMax M2.5/M2.7: B200 hits 1023 tok/s/GPU, H100 hits 208. Per-million costs land at $0.50 and $1.75 respectively. B200 is 250% cheaper per token; B200 delivers 392% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:3357.1H100:628.0
B200:1729.2H100:375.6
B200:1023.3H100:207.9
Cost ($/M tok)
B200:$0.162H100:$0.572
B200:$0.317H100:$0.939
B200:$0.501H100:$1.754
tok/s/MW
B200:1547063H100:363025
B200:796848H100:217088
B200:471560H100:120199
Concurrency
B200:~116H100:~43
B200:~45H100:~12
B200:~21H100:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.