Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — GB300 NVL72 vs H200

Head-to-head AI inference benchmark comparison of GB300 NVL72 (NVIDIA Blackwell) and H200 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

H200 hits 476 tok/s/GPU for $0.82 per million tokens at 53 tok/s/user on Qwen 3.5 397B-A17B. No GB300 NVL72 data at this operating point.

H200: 323 tok/s/GPU, $1.22 per million tokens at 80 tok/s/user on Qwen 3.5 397B-A17B. GB300 NVL72 is unmeasured here.

At 106 tok/s/user on Qwen 3.5 397B-A17B, H200 delivers 272 tok/s/GPU at $1.44 per million tokens; GB300 NVL72 hasn't been benchmarked at this target. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	GB300 NVL72:—H200:475.6	GB300 NVL72:—H200:322.8	GB300 NVL72:—H200:272.5
Cost ($/M tok)	GB300 NVL72:—H200:$0.821	GB300 NVL72:—H200:$1.216	GB300 NVL72:—H200:$1.438
tok/s/MW	GB300 NVL72:—H200:274884	GB300 NVL72:—H200:186573	GB300 NVL72:—H200:157493
Concurrency	GB300 NVL72:—H200:~37	GB300 NVL72:—H200:~17	GB300 NVL72:—H200:~12

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

ISL / OSL

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: