MiniMax M3 428B · GPU comparison

MiniMax M3 428B — B300 vs GB200 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 posts 5232 tok/s/GPU for $0.12 per million tokens at 62 tok/s/user on MiniMax M3 428B; GB200 NVL72 posts 4174 tok/s/GPU for $0.14. B300 is 16% cheaper per token; B300 delivers 25% more tok/s/GPU.

Throughput at 94 tok/s/user on MiniMax M3 428B: B300 hits 4761 tok/s/GPU, GB200 NVL72 hits 1871. Per-million costs land at $0.14 and $0.33 respectively. B300 is 142% cheaper per token; B300 delivers 154% more tok/s/GPU.

B300 / GB200 NVL72 on MiniMax M3 428B at 125 tok/s/user: 3450 / 793 tok/s/GPU, $0.19 / $0.78 per million tokens. B300 is 312% cheaper per token; B300 delivers 335% more tok/s/GPU. (Numbers reflect the default 8k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B300:5231.7GB200 NVL72:4173.7
B300:4761.2GB200 NVL72:1871.0
B300:3449.9GB200 NVL72:792.8
Cost ($/M tok)
B300:$0.124GB200 NVL72:$0.145
B300:$0.137GB200 NVL72:$0.331
B300:$0.189GB200 NVL72:$0.780
tok/s/MW
B300:2410927GB200 NVL72:1987492
B300:2194101GB200 NVL72:890929
B300:1589795GB200 NVL72:377518
Concurrency
B300:~166GB200 NVL72:~302
B300:~24GB200 NVL72:~49
B300:~14GB200 NVL72:~15

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Vendor:
Aggregation:
Spec Decoding: