Qwen 3.5 397B-A17B · GPU comparison

Qwen 3.5 397B-A17B — B300 vs H100

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 posts 3154 tok/s/GPU for $0.20 per million tokens at 63 tok/s/user on Qwen 3.5 397B-A17B; H100 posts 517 tok/s/GPU for $0.73. B300 is 255% cheaper per token; B300 delivers 510% more tok/s/GPU.

Throughput at 97 tok/s/user on Qwen 3.5 397B-A17B: B300 hits 1815 tok/s/GPU, H100 hits 220. Per-million costs land at $0.36 and $1.66 respectively. B300 is 359% cheaper per token; B300 delivers 725% more tok/s/GPU.

B300 / H100 on Qwen 3.5 397B-A17B at 132 tok/s/user: 1162 / 85 tok/s/GPU, $0.56 / $4.12 per million tokens. B300 is 638% cheaper per token; B300 delivers 1272% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
B300:3154.3H100:517.2
B300:1815.3H100:220.0
B300:1162.3H100:84.7
Cost ($/M tok)
B300:$0.205H100:$0.726
B300:$0.363H100:$1.664
B300:$0.558H100:$4.121
tok/s/MW
B300:1453609H100:298955
B300:836559H100:127176
B300:535632H100:48976
Concurrency
B300:~102H100:~39
B300:~39H100:~10
B300:~19H100:~3

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.