MI355X FP8: MTP vs Off Speculative Decoding
Speculative decoding comparison of MTP versus Off on MI355X FP8 (AMD CDNA 4) running GLM 5/5.1. Throughput, cost, and interactivity differences across LLM workloads. Use the chart controls below to switch sequences and metrics — same interactions as the main inference chart.
MTP acceptance-rate implementations differ across inference engines. Points from different engines are not directly comparable on the same curve — throughput and cost at matched interactivity may reflect engine-level differences rather than pure speculative decoding gains. Interpret cross-engine comparisons with caution.
MTP posts 726 tok/s/GPU for $0.56 per million tokens at 42 tok/s/user on GLM 5/5.1 (MI355X FP8); Off posts 193 tok/s/GPU for $2.11. MTP is 274% cheaper per token; MTP delivers 275% more tok/s/GPU. Draft-token acceptance rates determine whether speculative decoding helps or hurts at a given concurrency level.
Throughput at 47 tok/s/user on GLM 5/5.1 (MI355X FP8): MTP hits 595 tok/s/GPU, Off hits 169. Per-million costs land at $0.70 and $2.46 respectively. MTP is 252% cheaper per token; MTP delivers 251% more tok/s/GPU. Speculative decoding trades extra compute on draft tokens for fewer decoding steps — the payoff depends on sequence length and batch size.
Toward the upper edge of the 37–56 tok/s/user interactivity band, at 52 tok/s/user on GLM 5/5.1 (MI355X FP8): MTP runs 467 tok/s/GPU at $0.89/M tokens, Off runs 90 at $4.60/M. MTP is 419% cheaper per token; MTP delivers 421% more tok/s/GPU. Gains from speculative decoding vary by workload; short-output prompts tend to benefit less. (Numbers reflect this URL's pinned 1k/1k · fp8 workload — changing sequence or model updates both the table and chart; the table stays pinned to this page's precision, so precision toggles in the controls affect the chart only.)

| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | MTP:725.7Off:193.5 | MTP:594.8Off:169.4 | MTP:467.3Off:89.7 |
| Cost ($/M tok) | MTP:$0.564Off:$2.112 | MTP:$0.699Off:$2.460 | MTP:$0.885Off:$4.595 |
| tok/s/MW | MTP:273865Off:73008 | MTP:224470Off:63925 | MTP:176336Off:33834 |
| Concurrency | MTP:~47Off:~16 | MTP:~25Off:~6 | MTP:~18Off:~4 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.