MiniMax M3 428B · GPU comparison

MiniMax M3 428B — B200 vs H100

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B200 / H100 on MiniMax M3 428B at 76 tok/s/user: 2050 / 537 tok/s/GPU, $0.26 / $0.67 per million tokens. B200 is 155% cheaper per token; B200 delivers 282% more tok/s/GPU.

Around the middle of the 15–260 tok/s/user interactivity band, at 138 tok/s/user on MiniMax M3 428B: B200 runs 1252 tok/s/GPU at $0.45/M tokens, H100 runs 237 at $1.52/M. B200 is 236% cheaper per token; B200 delivers 429% more tok/s/GPU.

Setting 199 tok/s/user as the target on MiniMax M3 428B, B200 produces 554 tok/s/GPU ($0.98 per million tokens) and H100 produces 157 ($2.31). B200 is 136% cheaper per token; B200 delivers 254% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B200:2049.6H100:536.7	B200:1252.2H100:236.6	B200:554.1H100:156.6
Cost ($/M tok)	B200:$0.264H100:$0.673	B200:$0.451H100:$1.518	B200:$0.976H100:$2.307
tok/s/MW	B200:1198606H100:391761	B200:732286H100:172683	B200:324020H100:114281
Concurrency	B200:~119H100:~31	B200:~18H100:~8	B200:~5H100:~4

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: