gpt-oss 120B — H200 vs MI355X
Head-to-head AI inference benchmark comparison of H200 (NVIDIA Hopper) and MI355X (AMD CDNA 4) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Near the low end of the 63–273 tok/s/user interactivity band, at 115 tok/s/user on gpt-oss 120B: H200 runs 2864 tok/s/GPU at $0.14/M tokens, MI355X runs 8345 at $0.05/M. MI355X is 177% cheaper per token; MI355X delivers 191% more tok/s/GPU.
Setting 168 tok/s/user as the target on gpt-oss 120B, H200 produces 1441 tok/s/GPU ($0.27 per million tokens) and MI355X produces 2642 ($0.16). MI355X is 75% cheaper per token; MI355X delivers 83% more tok/s/GPU.
At 221 tok/s/user interactivity on gpt-oss 120B, H200 delivers 736 tok/s/GPU at $0.53 per million tokens; MI355X delivers 1684 tok/s/GPU at $0.24. MI355X is 116% cheaper per token; MI355X delivers 129% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | H200:2863.7MI355X:8344.5 | H200:1440.9MI355X:2641.7 | H200:735.9MI355X:1684.4 |
| Cost ($/M tok) | H200:$0.137MI355X:$0.049 | H200:$0.274MI355X:$0.156 | H200:$0.527MI355X:$0.244 |
| tok/s/MW | H200:1655340MI355X:3148877 | H200:832892MI355X:996865 | H200:425378MI355X:635613 |
| Concurrency | H200:~64MI355X:~37 | H200:~20MI355X:~32 | H200:~14MI355X:~32 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.