gpt-oss 120B · GPU comparison

gpt-oss 120B — H100 vs MI325X

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 78 tok/s/user interactivity on gpt-oss 120B, H100 delivers 3838 tok/s/GPU at $0.09 per million tokens; MI325X delivers 1385 tok/s/GPU at $0.26. H100 is 180% cheaper per token; H100 delivers 177% more tok/s/GPU at this point.

H100 posts 3493 tok/s/GPU for $0.10 per million tokens at 90 tok/s/user on gpt-oss 120B; MI325X posts 1126 tok/s/GPU for $0.32. H100 is 210% cheaper per token; H100 delivers 210% more tok/s/GPU.

Throughput at 102 tok/s/user on gpt-oss 120B: H100 hits 3130 tok/s/GPU, MI325X hits 771. Per-million costs land at $0.12 and $0.46 respectively. H100 is 299% cheaper per token; H100 delivers 306% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	H100:3838.4MI325X:1385.3	H100:3492.5MI325X:1126.4	H100:3129.9MI325X:770.7
Cost ($/M tok)	H100:$0.093MI325X:$0.262	H100:$0.102MI325X:$0.316	H100:$0.116MI325X:$0.461
tok/s/MW	H100:2218717MI325X:635442	H100:2018806MI325X:516699	H100:1809214MI325X:353511
Concurrency	H100:~64MI325X:~23	H100:~64MI325X:~28	H100:~64MI325X:~16

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: