MiniMax M3 428B — B300 vs H100
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Near the low end of the 15–198 tok/s/user interactivity band, at 61 tok/s/user on MiniMax M3 428B: B300 runs 1537 tok/s/GPU at $0.43/M tokens, H100 runs 724 at $0.50/M. B300 is 17% cheaper per token; B300 delivers 112% more tok/s/GPU.
Setting 107 tok/s/user as the target on MiniMax M3 428B, B300 produces 452 tok/s/GPU ($1.45 per million tokens) and H100 produces 313 ($1.13). H100 is 29% cheaper per token; B300 delivers 44% more tok/s/GPU.
At 153 tok/s/user interactivity on MiniMax M3 428B, B300 delivers 224 tok/s/GPU at $2.85 per million tokens; H100 delivers 211 tok/s/GPU at $1.67. H100 is 71% cheaper per token; B300 delivers 6% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:1537.0H100:724.2 | B300:451.8H100:313.3 | B300:224.0H100:210.5 |
| Cost ($/M tok) | B300:$0.430H100:$0.503 | B300:$1.454H100:$1.131 | B300:$2.853H100:$1.668 |
| tok/s/MW | B300:708275H100:418608 | B300:208190H100:181073 | B300:103204H100:121695 |
| Concurrency | B300:~55H100:~52 | B300:~19H100:~13 | B300:~6H100:~6 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.