gpt-oss 120B · GPU comparison

gpt-oss 120B — H200 vs MI355X

Head-to-head AI inference benchmark comparison of H200 (NVIDIA Hopper) and MI355X (AMD CDNA 4) on gpt-oss 120B. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

Near the low end of the 63–273 tok/s/user interactivity band, at 115 tok/s/user on gpt-oss 120B: H200 runs 2864 tok/s/GPU at $0.14/M tokens, MI355X runs 8345 at $0.05/M. MI355X is 177% cheaper per token; MI355X delivers 191% more tok/s/GPU.

Setting 168 tok/s/user as the target on gpt-oss 120B, H200 produces 1441 tok/s/GPU ($0.27 per million tokens) and MI355X produces 2642 ($0.16). MI355X is 75% cheaper per token; MI355X delivers 83% more tok/s/GPU.

At 221 tok/s/user interactivity on gpt-oss 120B, H200 delivers 736 tok/s/GPU at $0.53 per million tokens; MI355X delivers 1684 tok/s/GPU at $0.24. MI355X is 116% cheaper per token; MI355X delivers 129% more tok/s/GPU at this point. (Numbers reflect the default 1k/1k · fp4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
H200:2863.7MI355X:8344.5
H200:1440.9MI355X:2641.7
H200:735.9MI355X:1684.4
Cost ($/M tok)
H200:$0.137MI355X:$0.049
H200:$0.274MI355X:$0.156
H200:$0.527MI355X:$0.244
tok/s/MW
H200:1655340MI355X:3148877
H200:832892MI355X:996865
H200:425378MI355X:635613
Concurrency
H200:~64MI355X:~37
H200:~20MI355X:~32
H200:~14MI355X:~32

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.