Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — B200 vs H100

Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 61 tok/s/user interactivity on Qwen 3.5 397B-A17B, B200 delivers 3443 tok/s/GPU at $0.16 per million tokens; H100 delivers 254 tok/s/GPU at $1.43. B200 is 808% cheaper per token; B200 delivers 1256% more tok/s/GPU at this point.

B200 posts 2561 tok/s/GPU for $0.21 per million tokens at 78 tok/s/user on Qwen 3.5 397B-A17B; H100 posts 196 tok/s/GPU for $1.78. B200 is 740% cheaper per token; B200 delivers 1204% more tok/s/GPU.

Throughput at 94 tok/s/user on Qwen 3.5 397B-A17B: B200 hits 1929 tok/s/GPU, H100 hits 155. Per-million costs land at $0.28 and $2.38 respectively. B200 is 760% cheaper per token; B200 delivers 1146% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B200:3442.8H100:253.9
B200:2561.1H100:196.4
B200:1929.2H100:154.8
Cost ($/M tok)
B200:$0.157H100:$1.429
B200:$0.212H100:$1.779
B200:$0.277H100:$2.381
tok/s/MW
B200:1586558H100:146757
B200:1180229H100:113529
B200:889041H100:89489
Concurrency
B200:~123H100:~17
B200:~70H100:~10
B200:~46H100:~7

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.