Qwen 3.5 397B-A17B — H100 vs H200
Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and H200 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 61 tok/s/user interactivity on Qwen 3.5 397B-A17B, H100 delivers 254 tok/s/GPU at $1.43 per million tokens; H200 delivers 422 tok/s/GPU at $0.93. H200 is 54% cheaper per token; H200 delivers 66% more tok/s/GPU at this point.
H100 posts 196 tok/s/GPU for $1.78 per million tokens at 78 tok/s/user on Qwen 3.5 397B-A17B; H200 posts 331 tok/s/GPU for $1.19. H200 is 50% cheaper per token; H200 delivers 69% more tok/s/GPU.
Throughput at 94 tok/s/user on Qwen 3.5 397B-A17B: H100 hits 155 tok/s/GPU, H200 hits 286. Per-million costs land at $2.38 and $1.36 respectively. H200 is 76% cheaper per token; H200 delivers 85% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | H100:253.9H200:421.7 | H100:196.4H200:331.1 | H100:154.8H200:285.6 |
| Cost ($/M tok) | H100:$1.429H200:$0.928 | H100:$1.779H200:$1.188 | H100:$2.381H200:$1.355 |
| tok/s/MW | H100:146757H200:243777 | H100:113529H200:191366 | H100:89489H200:165112 |
| Concurrency | H100:~17H200:~28 | H100:~10H200:~17 | H100:~7H200:~13 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.