MI355X BF16: MTP vs Off Speculative Decoding
Speculative decoding comparison of MTP versus Off on MI355X BF16 (AMD CDNA 4) running Qwen 3.5 397B-A17B. Throughput, cost, and interactivity differences across LLM workloads. Use the chart controls below to switch sequences and metrics — same interactions as the main inference chart.
MTP acceptance-rate implementations differ across inference engines. Points from different engines are not directly comparable on the same curve — throughput and cost at matched interactivity may reflect engine-level differences rather than pure speculative decoding gains. Interpret cross-engine comparisons with caution.
At 78 tok/s/user on Qwen 3.5 397B-A17B (MI355X BF16), MTP delivers 786 tok/s/GPU at $0.52 per million tokens; Off delivers 339 tok/s/GPU at $1.19. MTP is 129% cheaper per token; MTP delivers 132% more tok/s/GPU. Speculative decoding accepts draft tokens to reduce per-token latency — gains vary by workload and prompt distribution.
MTP posts 732 tok/s/GPU for $0.56 per million tokens at 86 tok/s/user on Qwen 3.5 397B-A17B (MI355X BF16); Off posts 254 tok/s/GPU for $1.60. MTP is 187% cheaper per token; MTP delivers 188% more tok/s/GPU. Draft-token acceptance rates determine whether speculative decoding helps or hurts at a given concurrency level.
Throughput at 95 tok/s/user on Qwen 3.5 397B-A17B (MI355X BF16): MTP hits 648 tok/s/GPU, Off hits 164. Per-million costs land at $0.64 and $2.56 respectively. MTP is 301% cheaper per token; MTP delivers 295% more tok/s/GPU. Speculative decoding trades extra compute on draft tokens for fewer decoding steps — the payoff depends on sequence length and batch size. (Numbers reflect this URL's pinned 1k/1k · bf16 workload — changing sequence or model updates both the table and chart; the table stays pinned to this page's precision, so precision toggles in the controls affect the chart only.)

| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | MTP:785.7Off:339.1 | MTP:731.6Off:253.8 | MTP:647.6Off:164.1 |
| Cost ($/M tok) | MTP:$0.518Off:$1.188 | MTP:$0.556Off:$1.596 | MTP:$0.638Off:$2.560 |
| tok/s/MW | MTP:296494Off:127955 | MTP:276072Off:95789 | MTP:244392Off:61913 |
| Concurrency | MTP:~87Off:~18 | MTP:~50Off:~12 | MTP:~29Off:~7 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.