gpt-oss 120B · GPU comparison

gpt-oss 120B — GB200 NVL72 vs H100

Head-to-head AI inference benchmark comparison of GB200 NVL72 (NVIDIA Blackwell) and H100 (NVIDIA Hopper) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 117 tok/s/user interactivity on gpt-oss 120B, GB200 NVL72 delivers 18899 tok/s/GPU at $0.03 per million tokens; H100 delivers 2622 tok/s/GPU at $0.14. GB200 NVL72 is 328% cheaper per token; GB200 NVL72 delivers 621% more tok/s/GPU at this point.

GB200 NVL72 posts 16438 tok/s/GPU for $0.04 per million tokens at 166 tok/s/user on gpt-oss 120B; H100 posts 1379 tok/s/GPU for $0.26. GB200 NVL72 is 591% cheaper per token; GB200 NVL72 delivers 1092% more tok/s/GPU.

Throughput at 216 tok/s/user on gpt-oss 120B: GB200 NVL72 hits 8533 tok/s/GPU, H100 hits 740. Per-million costs land at $0.07 and $0.49 respectively. GB200 NVL72 is 593% cheaper per token; GB200 NVL72 delivers 1054% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
GB200 NVL72:18898.6H100:2621.5
GB200 NVL72:16438.0H100:1379.3
GB200 NVL72:8533.4H100:739.7
Cost ($/M tok)
GB200 NVL72:$0.032H100:$0.139
GB200 NVL72:$0.038H100:$0.262
GB200 NVL72:$0.070H100:$0.487
tok/s/MW
GB200 NVL72:8999325H100:1515329
GB200 NVL72:7827639H100:797303
GB200 NVL72:4063521H100:427578
Concurrency
GB200 NVL72:~264H100:~64
GB200 NVL72:~2684H100:~17
GB200 NVL72:~102H100:~8

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.