Kimi K2.5/K2.6 1T — B300 vs MI300X
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and MI300X (AMD CDNA 3) on Kimi K2.5/K2.6 1T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Throughput at 33 tok/s/user on Kimi K2.5/K2.6 1T: B300 hits 795 tok/s/GPU, MI300X hits 94. Per-million costs land at $0.84 and $3.34 respectively. B300 is 300% cheaper per token; B300 delivers 749% more tok/s/GPU.
B300 / MI300X on Kimi K2.5/K2.6 1T at 38 tok/s/user: 678 / 76 tok/s/GPU, $0.98 / $4.07 per million tokens. B300 is 316% cheaper per token; B300 delivers 788% more tok/s/GPU.
Toward the upper edge of the 28–48 tok/s/user interactivity band, at 44 tok/s/user on Kimi K2.5/K2.6 1T: B300 runs 534 tok/s/GPU at $1.25/M tokens, MI300X runs 60 at $5.32/M. B300 is 326% cheaper per token; B300 delivers 794% more tok/s/GPU. (Numbers reflect the default 1k/1k · int4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:795.0MI300X:93.7 | B300:678.1MI300X:76.3 | B300:533.6MI300X:59.7 |
| Cost ($/M tok) | B300:$0.836MI300X:$3.342 | B300:$0.978MI300X:$4.070 | B300:$1.250MI300X:$5.324 |
| tok/s/MW | B300:366374MI300X:52335 | B300:312475MI300X:42648 | B300:245921MI300X:33342 |
| Concurrency | B300:~64MI300X:~12 | B300:~64MI300X:~8 | B300:~32MI300X:~6 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.