DeepSeek R1 — B300 vs MI325X
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and MI325X (AMD CDNA 3) on DeepSeek R1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
B300 / MI325X on DeepSeek R1 at 39 tok/s/user: 4073 / 341 tok/s/GPU, $0.16 / $1.03 per million tokens. B300 is 544% cheaper per token; B300 delivers 1093% more tok/s/GPU.
Around the middle of the 29–70 tok/s/user interactivity band, at 50 tok/s/user on DeepSeek R1: B300 runs 3302 tok/s/GPU at $0.20/M tokens, MI325X runs 225 at $1.57/M. B300 is 673% cheaper per token; B300 delivers 1364% more tok/s/GPU.
Setting 60 tok/s/user as the target on DeepSeek R1, B300 produces 1976 tok/s/GPU ($0.33 per million tokens) and MI325X produces 132 ($2.67). B300 is 706% cheaper per token; B300 delivers 1395% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:4072.8MI325X:341.4 | B300:3301.8MI325X:225.5 | B300:1976.4MI325X:132.2 |
| Cost ($/M tok) | B300:$0.160MI325X:$1.027 | B300:$0.203MI325X:$1.566 | B300:$0.331MI325X:$2.672 |
| tok/s/MW | B300:1876874MI325X:156612 | B300:1521550MI325X:103433 | B300:910780MI325X:60648 |
| Concurrency | B300:~2628MI325X:~39 | B300:~1808MI325X:~19 | B300:~937MI325X:~9 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.