Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — H100 vs H200

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and H200 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 54 tok/s/user interactivity on Qwen 3.5 397B-A17B, H100 delivers 697 tok/s/GPU at $0.52 per million tokens; H200 delivers 469 tok/s/GPU at $0.83. H100 is 60% cheaper per token; H100 delivers 49% more tok/s/GPU at this point.

H100 posts 233 tok/s/GPU for $1.58 per million tokens at 81 tok/s/user on Qwen 3.5 397B-A17B; H200 posts 319 tok/s/GPU for $1.23. H200 is 28% cheaper per token; H200 delivers 37% more tok/s/GPU.

Throughput at 107 tok/s/user on Qwen 3.5 397B-A17B: H100 hits 203 tok/s/GPU, H200 hits 271. Per-million costs land at $1.79 and $1.45 respectively. H200 is 23% cheaper per token; H200 delivers 34% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:697.4H200:469.3
H100:232.6H200:318.8
H100:202.9H200:271.4
Cost ($/M tok)
H100:$0.522H200:$0.833
H100:$1.576H200:$1.229
H100:$1.785H200:$1.446
tok/s/MW
H100:403103H200:271276
H100:134426H200:184266
H100:117258H200:156874
Concurrency
H100:~58H200:~36
H100:~12H200:~16
H100:~8H200:~12

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.