Qwen 3.5 397B-A17B — B200 vs H100
Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 61 tok/s/user interactivity on Qwen 3.5 397B-A17B, B200 delivers 3443 tok/s/GPU at $0.16 per million tokens; H100 delivers 254 tok/s/GPU at $1.43. B200 is 808% cheaper per token; B200 delivers 1256% more tok/s/GPU at this point.
B200 posts 2561 tok/s/GPU for $0.21 per million tokens at 78 tok/s/user on Qwen 3.5 397B-A17B; H100 posts 196 tok/s/GPU for $1.78. B200 is 740% cheaper per token; B200 delivers 1204% more tok/s/GPU.
Throughput at 94 tok/s/user on Qwen 3.5 397B-A17B: B200 hits 1929 tok/s/GPU, H100 hits 155. Per-million costs land at $0.28 and $2.38 respectively. B200 is 760% cheaper per token; B200 delivers 1146% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B200:3442.8H100:253.9 | B200:2561.1H100:196.4 | B200:1929.2H100:154.8 |
| Cost ($/M tok) | B200:$0.157H100:$1.429 | B200:$0.212H100:$1.779 | B200:$0.277H100:$2.381 |
| tok/s/MW | B200:1586558H100:146757 | B200:1180229H100:113529 | B200:889041H100:89489 |
| Concurrency | B200:~123H100:~17 | B200:~70H100:~10 | B200:~46H100:~7 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.