gpt-oss 120B · GPU comparison

gpt-oss 120B — H100 vs MI325X

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 78 tok/s/user interactivity on gpt-oss 120B, H100 delivers 3838 tok/s/GPU at $0.09 per million tokens; MI325X delivers 1385 tok/s/GPU at $0.26. H100 is 180% cheaper per token; H100 delivers 177% more tok/s/GPU at this point.

H100 posts 3493 tok/s/GPU for $0.10 per million tokens at 90 tok/s/user on gpt-oss 120B; MI325X posts 1126 tok/s/GPU for $0.32. H100 is 210% cheaper per token; H100 delivers 210% more tok/s/GPU.

Throughput at 102 tok/s/user on gpt-oss 120B: H100 hits 3130 tok/s/GPU, MI325X hits 771. Per-million costs land at $0.12 and $0.46 respectively. H100 is 299% cheaper per token; H100 delivers 306% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:3838.4MI325X:1385.3
H100:3492.5MI325X:1126.4
H100:3129.9MI325X:770.7
Cost ($/M tok)
H100:$0.093MI325X:$0.262
H100:$0.102MI325X:$0.316
H100:$0.116MI325X:$0.461
tok/s/MW
H100:2218717MI325X:635442
H100:2018806MI325X:516699
H100:1809214MI325X:353511
Concurrency
H100:~64MI325X:~23
H100:~64MI325X:~28
H100:~64MI325X:~16

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.