MiniMax M3 428B · GPU comparison

MiniMax M3 428B — GB200 NVL72 vs H100

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

GB200 NVL72 / H100 on MiniMax M3 428B at 52 tok/s/user: 1598 / 836 tok/s/GPU, $0.38 / $0.43 per million tokens. GB200 NVL72 is 12% cheaper per token; GB200 NVL72 delivers 91% more tok/s/GPU.

Around the middle of the 17–157 tok/s/user interactivity band, at 87 tok/s/user on MiniMax M3 428B: GB200 NVL72 runs 638 tok/s/GPU at $1.00/M tokens, H100 runs 436 at $0.84/M. H100 is 19% cheaper per token; GB200 NVL72 delivers 46% more tok/s/GPU.

Setting 122 tok/s/user as the target on MiniMax M3 428B, GB200 NVL72 produces 211 tok/s/GPU ($2.86 per million tokens) and H100 produces 271 ($1.32). H100 is 117% cheaper per token; H100 delivers 29% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB200 NVL72:1598.2H100:836.1
GB200 NVL72:638.3H100:436.2
GB200 NVL72:210.9H100:271.1
Cost ($/M tok)
GB200 NVL72:$0.382H100:$0.430
GB200 NVL72:$0.996H100:$0.837
GB200 NVL72:$2.862H100:$1.322
tok/s/MW
GB200 NVL72:761024H100:483284
GB200 NVL72:303973H100:252146
GB200 NVL72:100447H100:156695
Concurrency
GB200 NVL72:~180H100:~72
GB200 NVL72:~74H100:~22
GB200 NVL72:~18H100:~10

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Vendor:
Aggregation:
Spec Decoding: