Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — B200 vs MI300X

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and MI300X (AMD CDNA 3) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Near the low end of the 34–71 tok/s/user interactivity band, at 43 tok/s/user on Qwen 3.5 397B-A17B: B200 runs 4781 tok/s/GPU at $0.11/M tokens, MI300X runs 346 at $0.89/M. B200 is 688% cheaper per token; B200 delivers 1283% more tok/s/GPU.

Setting 53 tok/s/user as the target on Qwen 3.5 397B-A17B, B200 produces 4181 tok/s/GPU ($0.13 per million tokens) and MI300X produces 190 ($1.63). B200 is 1168% cheaper per token; B200 delivers 2106% more tok/s/GPU.

At 62 tok/s/user interactivity on Qwen 3.5 397B-A17B, B200 delivers 3647 tok/s/GPU at $0.15 per million tokens; MI300X delivers 123 tok/s/GPU at $2.52. B200 is 1595% cheaper per token; B200 delivers 2872% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B200:4780.6MI300X:345.8	B200:4180.6MI300X:189.5	B200:3647.5MI300X:122.7
Cost ($/M tok)	B200:$0.113MI300X:$0.893	B200:$0.129MI300X:$1.633	B200:$0.148MI300X:$2.517
tok/s/MW	B200:2203062MI300X:193162	B200:1926539MI300X:105886	B200:1680869MI300X:68573
Concurrency	B200:~245MI300X:~34	B200:~179MI300X:~15	B200:~129MI300X:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: