DeepSeek V4 Pro 1.6T — B300 vs MI355X
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and MI355X (AMD CDNA 4) on DeepSeek V4 Pro 1.6T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
B300 posts 4849 tok/s/GPU for $0.13 per million tokens at 17 tok/s/user on DeepSeek V4 Pro 1.6T; MI355X posts 1609 tok/s/GPU for $0.25. B300 is 89% cheaper per token; B300 delivers 201% more tok/s/GPU.
Throughput at 30 tok/s/user on DeepSeek V4 Pro 1.6T: B300 hits 2335 tok/s/GPU, MI355X hits 321. Per-million costs land at $0.27 and $1.32 respectively. B300 is 385% cheaper per token; B300 delivers 627% more tok/s/GPU.
B300 / MI355X on DeepSeek V4 Pro 1.6T at 42 tok/s/user: 967 / 149 tok/s/GPU, $0.67 / $2.78 per million tokens. B300 is 315% cheaper per token; B300 delivers 550% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:4848.8MI355X:1608.8 | B300:2334.6MI355X:321.3 | B300:966.7MI355X:148.7 |
| Cost ($/M tok) | B300:$0.134MI355X:$0.254 | B300:$0.271MI355X:$1.316 | B300:$0.669MI355X:$2.776 |
| tok/s/MW | B300:2234450MI355X:607113 | B300:1075873MI355X:121244 | B300:445493MI355X:56112 |
| Concurrency | B300:~1024MI355X:~402 | B300:~238MI355X:~47 | B300:~50MI355X:~15 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.