gpt-oss 120B — H100 vs MI325X
Head-to-head AI inference benchmark comparison of H100 (NVIDIA Hopper) and MI325X (AMD CDNA 3) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
At 78 tok/s/user interactivity on gpt-oss 120B, H100 delivers 3838 tok/s/GPU at $0.09 per million tokens; MI325X delivers 1486 tok/s/GPU at $0.24. H100 is 162% cheaper per token; H100 delivers 158% more tok/s/GPU at this point.
H100 posts 3493 tok/s/GPU for $0.10 per million tokens at 90 tok/s/user on gpt-oss 120B; MI325X posts 1019 tok/s/GPU for $0.34. H100 is 230% cheaper per token; H100 delivers 243% more tok/s/GPU.
Throughput at 101 tok/s/user on gpt-oss 120B: H100 hits 3161 tok/s/GPU, MI325X hits 771. Per-million costs land at $0.11 and $0.46 respectively. H100 is 303% cheaper per token; H100 delivers 310% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | H100:3838.4MI325X:1485.6 | H100:3492.5MI325X:1018.7 | H100:3161.1MI325X:770.8 |
| Cost ($/M tok) | H100:$0.093MI325X:$0.245 | H100:$0.102MI325X:$0.337 | H100:$0.114MI325X:$0.461 |
| tok/s/MW | H100:2218717MI325X:681469 | H100:2018806MI325X:467284 | H100:1827229MI325X:353573 |
| Concurrency | H100:~64MI325X:~42 | H100:~64MI325X:~23 | H100:~64MI325X:~16 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.