Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — H100 vs H200

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and H200 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 61 tok/s/user interactivity on Qwen 3.5 397B-A17B, H100 delivers 254 tok/s/GPU at $1.43 per million tokens; H200 delivers 422 tok/s/GPU at $0.93. H200 is 54% cheaper per token; H200 delivers 66% more tok/s/GPU at this point.

H100 posts 196 tok/s/GPU for $1.78 per million tokens at 78 tok/s/user on Qwen 3.5 397B-A17B; H200 posts 331 tok/s/GPU for $1.19. H200 is 50% cheaper per token; H200 delivers 69% more tok/s/GPU.

Throughput at 94 tok/s/user on Qwen 3.5 397B-A17B: H100 hits 155 tok/s/GPU, H200 hits 286. Per-million costs land at $2.38 and $1.36 respectively. H200 is 76% cheaper per token; H200 delivers 85% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:253.9H200:421.7
H100:196.4H200:331.1
H100:154.8H200:285.6
Cost ($/M tok)
H100:$1.429H200:$0.928
H100:$1.779H200:$1.188
H100:$2.381H200:$1.355
tok/s/MW
H100:146757H200:243777
H100:113529H200:191366
H100:89489H200:165112
Concurrency
H100:~17H200:~28
H100:~10H200:~17
H100:~7H200:~13

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.