Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — B200 vs MI355X

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and MI355X (AMD CDNA 4) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 81 tok/s/user interactivity on Qwen 3.5 397B-A17B, B200 delivers 2426 tok/s/GPU at $0.22 per million tokens; MI355X delivers 1625 tok/s/GPU at $0.26. B200 is 15% cheaper per token; B200 delivers 49% more tok/s/GPU at this point.

B200 posts 1057 tok/s/GPU for $0.51 per million tokens at 133 tok/s/user on Qwen 3.5 397B-A17B; MI355X posts 913 tok/s/GPU for $0.45. MI355X is 13% cheaper per token; B200 delivers 16% more tok/s/GPU.

Throughput at 185 tok/s/user on Qwen 3.5 397B-A17B: B200 hits 510 tok/s/GPU, MI355X hits 510. Per-million costs land at $1.02 and $0.78 respectively. MI355X is 31% cheaper per token; throughput per GPU is essentially tied. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:2425.9MI355X:1625.2
B200:1057.1MI355X:913.0
B200:510.1MI355X:510.4
Cost ($/M tok)
B200:$0.223MI355X:$0.257
B200:$0.512MI355X:$0.453
B200:$1.021MI355X:$0.781
tok/s/MW
B200:1117918MI355X:613278
B200:487154MI355X:344529
B200:235082MI355X:192596
Concurrency
B200:~65MI355X:~41
B200:~18MI355X:~14
B200:~6MI355X:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.