B300 FP8: MTP vs Off Speculative Decoding
Speculative decoding comparison of MTP versus Off on B300 FP8 (NVIDIA Blackwell) running Qwen 3.5 397B-A17B. Throughput, cost, and interactivity differences across LLM workloads. Use the chart controls below to switch sequences and metrics — same interactions as the main inference chart.
MTP acceptance-rate implementations differ across inference engines. Points from different engines are not directly comparable on the same curve — throughput and cost at matched interactivity may reflect engine-level differences rather than pure speculative decoding gains. Interpret cross-engine comparisons with caution.
MTP posts 2963 tok/s/GPU for $0.22 per million tokens at 67 tok/s/user on Qwen 3.5 397B-A17B (B300 FP8); Off posts 1169 tok/s/GPU for $0.56. MTP is 156% cheaper per token; MTP delivers 154% more tok/s/GPU. Draft-token acceptance rates determine whether speculative decoding helps or hurts at a given concurrency level.
Throughput at 94 tok/s/user on Qwen 3.5 397B-A17B (B300 FP8): MTP hits 1913 tok/s/GPU, Off hits 731. Per-million costs land at $0.34 and $0.89 respectively. MTP is 158% cheaper per token; MTP delivers 162% more tok/s/GPU. Speculative decoding trades extra compute on draft tokens for fewer decoding steps — the payoff depends on sequence length and batch size.
Toward the upper edge of the 40–147 tok/s/user interactivity band, at 120 tok/s/user on Qwen 3.5 397B-A17B (B300 FP8): MTP runs 1299 tok/s/GPU at $0.49/M tokens, Off runs 484 at $1.34/M. MTP is 171% cheaper per token; MTP delivers 168% more tok/s/GPU. Gains from speculative decoding vary by workload; short-output prompts tend to benefit less. (Numbers reflect this URL's pinned 1k/1k · fp8 workload — changing sequence or model updates both the table and chart; the table stays pinned to this page's precision, so precision toggles in the controls affect the chart only.)

| Metric | Interactivity (tok/s/user) | Interactivity (tok/s/user) | Interactivity (tok/s/user) |
|---|---|---|---|
| Throughput (tok/s/gpu) | MTP:2962.5Off:1168.5 | MTP:1913.1Off:731.3 | MTP:1299.5Off:484.3 |
| Cost ($/M tok) | MTP:$0.217Off:$0.557 | MTP:$0.344Off:$0.889 | MTP:$0.493Off:$1.337 |
| tok/s/MW | MTP:1365222Off:538500 | MTP:881623Off:337022 | MTP:598835Off:223202 |
| Concurrency | MTP:~90Off:~36 | MTP:~42Off:~16 | MTP:~23Off:~8 |
Inference Performance
Inference performance metrics across different models, hardware configurations, and serving parameters.