GLM 5/5.1 · GPU comparison

GLM 5/5.1 — B300 vs GB300 NVL72

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and GB300 NVL72 (NVIDIA Blackwell) on GLM 5/5.1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

B300 / GB300 NVL72 on GLM 5/5.1 at 61 tok/s/user: 1603 / 7809 tok/s/GPU, $0.41 / $0.09 per million tokens. GB300 NVL72 is 334% cheaper per token; GB300 NVL72 delivers 387% more tok/s/GPU.

Around the middle of the 24–173 tok/s/user interactivity band, at 98 tok/s/user on GLM 5/5.1: B300 runs 694 tok/s/GPU at $0.94/M tokens, GB300 NVL72 runs 2790 at $0.27/M. GB300 NVL72 is 252% cheaper per token; GB300 NVL72 delivers 302% more tok/s/GPU.

Setting 136 tok/s/user as the target on GLM 5/5.1, B300 produces 357 tok/s/GPU ($1.73 per million tokens) and GB300 NVL72 produces 246 ($2.98). B300 is 72% cheaper per token; B300 delivers 45% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B300:1603.0GB300 NVL72:7809.1	B300:693.7GB300 NVL72:2790.4	B300:357.1GB300 NVL72:245.9
Cost ($/M tok)	B300:$0.406GB300 NVL72:$0.094	B300:$0.937GB300 NVL72:$0.266	B300:$1.733GB300 NVL72:$2.978
tok/s/MW	B300:738704GB300 NVL72:3718633	B300:319673GB300 NVL72:1328738	B300:164565GB300 NVL72:117099
Concurrency	B300:~54GB300 NVL72:~1581	B300:~14GB300 NVL72:~472	B300:~5GB300 NVL72:~49

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: