Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — GB300 NVL72 vs H200

Head-to-head AI inference benchmark comparison of GB300 NVL72 (NVIDIA Blackwell) and H200 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

H200 hits 476 tok/s/GPU for $0.82 per million tokens at 53 tok/s/user on Qwen 3.5 397B-A17B. No GB300 NVL72 data at this operating point.

H200: 323 tok/s/GPU, $1.22 per million tokens at 80 tok/s/user on Qwen 3.5 397B-A17B. GB300 NVL72 is unmeasured here.

At 106 tok/s/user on Qwen 3.5 397B-A17B, H200 delivers 272 tok/s/GPU at $1.44 per million tokens; GB300 NVL72 hasn't been benchmarked at this target. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB300 NVL72:H200:475.6
GB300 NVL72:H200:322.8
GB300 NVL72:H200:272.5
Cost ($/M tok)
GB300 NVL72:H200:$0.821
GB300 NVL72:H200:$1.216
GB300 NVL72:H200:$1.438
tok/s/MW
GB300 NVL72:H200:274884
GB300 NVL72:H200:186573
GB300 NVL72:H200:157493
Concurrency
GB300 NVL72:H200:~37
GB300 NVL72:H200:~17
GB300 NVL72:H200:~12

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Vendor:
Aggregation:
Spec Decoding: