Qwen 3.5 397B-A17B — B300 vs H100
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
B300 posts 3154 tok/s/GPU for $0.20 per million tokens at 63 tok/s/user on Qwen 3.5 397B-A17B; H100 posts 517 tok/s/GPU for $0.73. B300 is 255% cheaper per token; B300 delivers 510% more tok/s/GPU.
Throughput at 97 tok/s/user on Qwen 3.5 397B-A17B: B300 hits 1815 tok/s/GPU, H100 hits 220. Per-million costs land at $0.36 and $1.66 respectively. B300 is 359% cheaper per token; B300 delivers 725% more tok/s/GPU.
B300 / H100 on Qwen 3.5 397B-A17B at 132 tok/s/user: 1162 / 85 tok/s/GPU, $0.56 / $4.12 per million tokens. B300 is 638% cheaper per token; B300 delivers 1272% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:3154.3H100:517.2 | B300:1815.3H100:220.0 | B300:1162.3H100:84.7 |
| Cost ($/M tok) | B300:$0.205H100:$0.726 | B300:$0.363H100:$1.664 | B300:$0.558H100:$4.121 |
| tok/s/MW | B300:1453609H100:298955 | B300:836559H100:127176 | B300:535632H100:48976 |
| Concurrency | B300:~102H100:~39 | B300:~39H100:~10 | B300:~19H100:~3 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.