DeepSeek R1 · GPU comparison

DeepSeek R1 — B300 vs MI325X

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and MI325X (AMD CDNA 3) on DeepSeek R1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 / MI325X on DeepSeek R1 at 39 tok/s/user: 4073 / 341 tok/s/GPU, $0.16 / $1.03 per million tokens. B300 is 544% cheaper per token; B300 delivers 1093% more tok/s/GPU.

Around the middle of the 29–70 tok/s/user interactivity band, at 50 tok/s/user on DeepSeek R1: B300 runs 3302 tok/s/GPU at $0.20/M tokens, MI325X runs 225 at $1.57/M. B300 is 673% cheaper per token; B300 delivers 1364% more tok/s/GPU.

Setting 60 tok/s/user as the target on DeepSeek R1, B300 produces 1976 tok/s/GPU ($0.33 per million tokens) and MI325X produces 132 ($2.67). B300 is 706% cheaper per token; B300 delivers 1395% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B300:4072.8MI325X:341.4
B300:3301.8MI325X:225.5
B300:1976.4MI325X:132.2
Cost ($/M tok)
B300:$0.160MI325X:$1.027
B300:$0.203MI325X:$1.566
B300:$0.331MI325X:$2.672
tok/s/MW
B300:1876874MI325X:156612
B300:1521550MI325X:103433
B300:910780MI325X:60648
Concurrency
B300:~2628MI325X:~39
B300:~1808MI325X:~19
B300:~937MI325X:~9

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.