gpt-oss 120B — H100 vs MI325X
Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 78 tok/s/user interactivity on gpt-oss 120B, H100 delivers 3838 tok/s/GPU at $0.09 per million tokens; MI325X delivers 1385 tok/s/GPU at $0.26. H100 is 180% cheaper per token; H100 delivers 177% more tok/s/GPU at this point.
H100 posts 3493 tok/s/GPU for $0.10 per million tokens at 90 tok/s/user on gpt-oss 120B; MI325X posts 1126 tok/s/GPU for $0.32. H100 is 210% cheaper per token; H100 delivers 210% more tok/s/GPU.
Throughput at 102 tok/s/user on gpt-oss 120B: H100 hits 3130 tok/s/GPU, MI325X hits 771. Per-million costs land at $0.12 and $0.46 respectively. H100 is 299% cheaper per token; H100 delivers 306% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | H100:3838.4MI325X:1385.3 | H100:3492.5MI325X:1126.4 | H100:3129.9MI325X:770.7 |
| Cost ($/M tok) | H100:$0.093MI325X:$0.262 | H100:$0.102MI325X:$0.316 | H100:$0.116MI325X:$0.461 |
| tok/s/MW | H100:2218717MI325X:635442 | H100:2018806MI325X:516699 | H100:1809214MI325X:353511 |
| Concurrency | H100:~64MI325X:~23 | H100:~64MI325X:~28 | H100:~64MI325X:~16 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.