GLM 5/5.1 — B200 vs MI325X
Head-to-head AI inference benchmark comparison of B200 (NVIDIA Blackwell) and MI325X (AMD CDNA 3) on GLM 5/5.1. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.
Setting 21 tok/s/user as the target on GLM 5/5.1, B200 produces 1987 tok/s/GPU ($0.27 per million tokens) and MI325X produces 173 ($2.08). B200 is 664% cheaper per token; B200 delivers 1050% more tok/s/GPU.
At 26 tok/s/user interactivity on GLM 5/5.1, B200 delivers 948 tok/s/GPU at $0.57 per million tokens; MI325X delivers 101 tok/s/GPU at $3.53. B200 is 519% cheaper per token; B200 delivers 836% more tok/s/GPU at this point.
B200 posts 683 tok/s/GPU for $0.80 per million tokens at 30 tok/s/user on GLM 5/5.1; MI325X posts 62 tok/s/GPU for $5.71. B200 is 612% cheaper per token; B200 delivers 1010% more tok/s/GPU. (Numbers reflect the default 1k/1k · fp8 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)
| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | B200:1987.1MI325X:172.9 | B200:948.0MI325X:101.3 | B200:683.3MI325X:61.6 |
| Cost ($/M tok) | B200:$0.273MI325X:$2.081 | B200:$0.571MI325X:$3.534 | B200:$0.802MI325X:$5.710 |
| tok/s/MW | B200:915694MI325X:79293 | B200:436844MI325X:46466 | B200:314867MI325X:28234 |
| Concurrency | B200:~1254MI325X:~34 | B200:~396MI325X:~16 | B200:~518MI325X:~9 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.