gpt-oss 120B · GPU comparison

gpt-oss 120B — H100 vs MI325X

Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 78 tok/s/user interactivity on gpt-oss 120B, H100 delivers 3838 tok/s/GPU at $0.09 per million tokens; MI325X delivers 1486 tok/s/GPU at $0.24. H100 is 162% cheaper per token; H100 delivers 158% more tok/s/GPU at this point.

H100 posts 3493 tok/s/GPU for $0.10 per million tokens at 90 tok/s/user on gpt-oss 120B; MI325X posts 1019 tok/s/GPU for $0.34. H100 is 230% cheaper per token; H100 delivers 243% more tok/s/GPU.

Throughput at 101 tok/s/user on gpt-oss 120B: H100 hits 3161 tok/s/GPU, MI325X hits 771. Per-million costs land at $0.11 and $0.46 respectively. H100 is 303% cheaper per token; H100 delivers 310% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H100:3838.4MI325X:1485.6
H100:3492.5MI325X:1018.7
H100:3161.1MI325X:770.8
Cost ($/M tok)
H100:$0.093MI325X:$0.245
H100:$0.102MI325X:$0.337
H100:$0.114MI325X:$0.461
tok/s/MW
H100:2218717MI325X:681469
H100:2018806MI325X:467284
H100:1827229MI325X:353573
Concurrency
H100:~64MI325X:~42
H100:~64MI325X:~23
H100:~64MI325X:~16

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.