MI355X FP4: MTP vs Off Speculative Decoding
Speculative decoding comparison of MTP versus Off on MI355X FP4 (AMD CDNA 4) running Qwen 3.5 397B-A17B. Throughput, cost, and interactivity differences across LLM workloads. Use the chart controls below to switch sequences and metrics — same interactions as the main inference chart.
MTP acceptance-rate implementations differ across inference engines. Points from different engines are not directly comparable on the same curve — throughput and cost at matched interactivity may reflect engine-level differences rather than pure speculative decoding gains. Interpret cross-engine comparisons with caution.
MTP posts 2222 tok/s/GPU for $0.19 per million tokens at 44 tok/s/user on Qwen 3.5 397B-A17B (MI355X FP4); Off posts 1836 tok/s/GPU for $0.22. MTP is 20% cheaper per token; MTP delivers 21% more tok/s/GPU. Draft-token acceptance rates determine whether speculative decoding helps or hurts at a given concurrency level.
Throughput at 64 tok/s/user on Qwen 3.5 397B-A17B (MI355X FP4): MTP hits 1573 tok/s/GPU, Off hits 1061. Per-million costs land at $0.26 and $0.39 respectively. MTP is 50% cheaper per token; MTP delivers 48% more tok/s/GPU. Speculative decoding trades extra compute on draft tokens for fewer decoding steps — the payoff depends on sequence length and batch size.
Toward the upper edge of the 25–104 tok/s/user interactivity band, at 84 tok/s/user on Qwen 3.5 397B-A17B (MI355X FP4): MTP runs 1279 tok/s/GPU at $0.32/M tokens, Off runs 562 at $0.73/M. MTP is 126% cheaper per token; MTP delivers 127% more tok/s/GPU. Gains from speculative decoding vary by workload; short-output prompts tend to benefit less. (Numbers reflect this URL's pinned 1k/1k · fp4 workload — changing sequence or model updates both the table and chart; the table stays pinned to this page's precision, so precision toggles in the controls affect the chart only.)

| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | MTP:2221.9Off:1835.8 | MTP:1572.7Off:1061.5 | MTP:1278.7Off:562.4 |
| Cost ($/M tok) | MTP:$0.186Off:$0.224 | MTP:$0.258Off:$0.386 | MTP:$0.322Off:$0.726 |
| tok/s/MW | MTP:838467Off:692757 | MTP:593481Off:400563 | MTP:482510Off:212215 |
| Concurrency | MTP:~51Off:~42 | MTP:~25Off:~16 | MTP:~16Off:~14 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.