MiniMax M3 428B — B300 vs H200
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and H200 (NVIDIA Hopper) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Setting 59 tok/s/user as the target on MiniMax M3 428B, B300 produces 1664 tok/s/GPU ($0.39 per million tokens) and H200 produces 855 ($0.46). B300 is 17% cheaper per token; B300 delivers 95% more tok/s/GPU.
At 106 tok/s/user interactivity on MiniMax M3 428B, B300 delivers 463 tok/s/GPU at $1.42 per million tokens; H200 delivers 499 tok/s/GPU at $0.80. H200 is 78% cheaper per token; H200 delivers 8% more tok/s/GPU at this point.
B300 posts 225 tok/s/GPU for $2.83 per million tokens at 152 tok/s/user on MiniMax M3 428B; H200 posts 333 tok/s/GPU for $1.16. H200 is 144% cheaper per token; H200 delivers 48% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:1663.8H200:855.2 | B300:463.4H200:498.6 | B300:225.5H200:332.6 |
| Cost ($/M tok) | B300:$0.392H200:$0.457 | B300:$1.421H200:$0.798 | B300:$2.833H200:$1.161 |
| tok/s/MW | B300:766735H200:494338 | B300:213530H200:288187 | B300:103902H200:192262 |
| Concurrency | B300:~63H200:~31 | B300:~20H200:~10 | B300:~7H200:~4 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.