MiniMax M3 428B · GPU comparison

MiniMax M3 428B — B200 vs H200

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and H200 (NVIDIA Hopper) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Throughput at 56 tok/s/user on MiniMax M3 428B: B200 hits 950 tok/s/GPU, H200 hits 910. Per-million costs land at $0.57 and $0.43 respectively. H200 is 31% cheaper per token; B200 delivers 4% more tok/s/GPU.

B200 / H200 on MiniMax M3 428B at 103 tok/s/user: 343 / 519 tok/s/GPU, $1.57 / $0.77 per million tokens. H200 is 105% cheaper per token; H200 delivers 51% more tok/s/GPU.

Toward the upper edge of the 9–198 tok/s/user interactivity band, at 151 tok/s/user on MiniMax M3 428B: B200 runs 126 tok/s/GPU at $4.32/M tokens, H200 runs 334 at $1.15/M. H200 is 274% cheaper per token; H200 delivers 166% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:949.7H200:909.7
B200:342.9H200:518.5
B200:125.5H200:334.5
Cost ($/M tok)
B200:$0.567H200:$0.433
B200:$1.572H200:$0.767
B200:$4.315H200:$1.153
tok/s/MW
B200:437632H200:525831
B200:158037H200:299740
B200:57843H200:193328
Concurrency
B200:~40H200:~35
B200:~8H200:~11
B200:~4H200:~4

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.