Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — B200 vs MI325X

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and MI325X (AMD CDNA 3) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B200 / MI325X on Qwen 3.5 397B-A17B at 46 tok/s/user: 4600 / 403 tok/s/GPU, $0.12 / $0.86 per million tokens. B200 is 635% cheaper per token; B200 delivers 1042% more tok/s/GPU.

Around the middle of the 37–72 tok/s/user interactivity band, at 55 tok/s/user on Qwen 3.5 397B-A17B: B200 runs 4061 tok/s/GPU at $0.13/M tokens, MI325X runs 219 at $1.63/M. B200 is 1131% cheaper per token; B200 delivers 1751% more tok/s/GPU.

Setting 64 tok/s/user as the target on Qwen 3.5 397B-A17B, B200 produces 3529 tok/s/GPU ($0.15 per million tokens) and MI325X produces 134 ($2.64). B200 is 1614% cheaper per token; B200 delivers 2528% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B200:4600.2MI325X:402.9	B200:4061.3MI325X:219.4	B200:3529.1MI325X:134.3
Cost ($/M tok)	B200:$0.118MI325X:$0.864	B200:$0.133MI325X:$1.631	B200:$0.154MI325X:$2.638
tok/s/MW	B200:2119899MI325X:184830	B200:1871590MI325X:100633	B200:1626333MI325X:61591
Concurrency	B200:~225MI325X:~37	B200:~166MI325X:~16	B200:~120MI325X:~9

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: