DeepSeek R1 · GPU comparison

DeepSeek R1 — B300 vs MI300X

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and MI300X (AMD CDNA 3) on DeepSeek R1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Near the low end of the 25–68 tok/s/user interactivity band, at 36 tok/s/user on DeepSeek R1: B300 runs 4083 tok/s/GPU at $0.16/M tokens, MI300X runs 268 at $1.16/M. B300 is 630% cheaper per token; B300 delivers 1425% more tok/s/GPU.

Setting 47 tok/s/user as the target on DeepSeek R1, B300 produces 3653 tok/s/GPU ($0.18 per million tokens) and MI300X produces 186 ($1.66). B300 is 819% cheaper per token; B300 delivers 1860% more tok/s/GPU.

At 58 tok/s/user interactivity on DeepSeek R1, B300 delivers 2220 tok/s/GPU at $0.30 per million tokens; MI300X delivers 113 tok/s/GPU at $2.74. B300 is 817% cheaper per token; B300 delivers 1858% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B300:4082.8MI300X:267.6	B300:3653.4MI300X:186.4	B300:2219.6MI300X:113.3
Cost ($/M tok)	B300:$0.159MI300X:$1.163	B300:$0.181MI300X:$1.663	B300:$0.299MI300X:$2.737
tok/s/MW	B300:1881464MI300X:149525	B300:1683576MI300X:104145	B300:1022845MI300X:63313
Concurrency	B300:~2795MI300X:~31	B300:~2073MI300X:~17	B300:~1094MI300X:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: