Qwen 3.5 397B-A17B — B300 vs GB200 NVL72
Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB200 NVL72 (NVIDIA Blackwell) on Qwen 3.5 397B-A17B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 65 tok/s/user interactivity on Qwen 3.5 397B-A17B, B300 delivers 3055 tok/s/GPU at $0.21 per million tokens; GB200 NVL72 delivers 1198 tok/s/GPU at $0.52. B300 is 148% cheaper per token; B300 delivers 155% more tok/s/GPU at this point.
B300 posts 1666 tok/s/GPU for $0.39 per million tokens at 102 tok/s/user on Qwen 3.5 397B-A17B; GB200 NVL72 posts 456 tok/s/GPU for $1.32. B300 is 237% cheaper per token; B300 delivers 266% more tok/s/GPU.
Throughput at 138 tok/s/user on Qwen 3.5 397B-A17B: B300 hits 1101 tok/s/GPU, GB200 NVL72 hits 201. Per-million costs land at $0.59 and $3.06 respectively. B300 is 417% cheaper per token; B300 delivers 449% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B300:3055.1GB200 NVL72:1198.2 | B300:1665.6GB200 NVL72:455.6 | B300:1100.9GB200 NVL72:200.6 |
| Cost ($/M tok) | B300:$0.211GB200 NVL72:$0.522 | B300:$0.393GB200 NVL72:$1.323 | B300:$0.592GB200 NVL72:$3.057 |
| tok/s/MW | B300:1407873GB200 NVL72:570576 | B300:767570GB200 NVL72:216965 | B300:507336GB200 NVL72:95505 |
| Concurrency | B300:~96GB200 NVL72:~100 | B300:~34GB200 NVL72:~19 | B300:~17GB200 NVL72:~6 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.