MiniMax M3 428B — B300 vs MI355X
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and MI355X (AMD CDNA 4) on MiniMax M3 428B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Near the low end of the 13–46 tok/s/user interactivity band, at 21 tok/s/user on MiniMax M3 428B: B300 runs 3776 tok/s/GPU at $0.17/M tokens, MI355X runs 2159 at $0.19/M. B300 is 10% cheaper per token; B300 delivers 75% more tok/s/GPU.
Setting 29 tok/s/user as the target on MiniMax M3 428B, B300 produces 3669 tok/s/GPU ($0.18 per million tokens) and MI355X produces 1859 ($0.22). B300 is 23% cheaper per token; B300 delivers 97% more tok/s/GPU.
At 38 tok/s/user interactivity on MiniMax M3 428B, B300 delivers 3123 tok/s/GPU at $0.21 per million tokens; MI355X delivers 1456 tok/s/GPU at $0.29. B300 is 40% cheaper per token; B300 delivers 115% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:3776.4MI355X:2158.8 | B300:3669.2MI355X:1859.2 | B300:3123.5MI355X:1456.2 |
| Cost ($/M tok) | B300:$0.172MI355X:$0.190 | B300:$0.177MI355X:$0.219 | B300:$0.210MI355X:$0.293 |
| tok/s/MW | B300:1740277MI355X:814640 | B300:1690858MI355X:701596 | B300:1439398MI355X:549495 |
| Concurrency | B300:~414MI355X:~226 | B300:~288MI355X:~143 | B300:~181MI355X:~92 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.