MiniMax M2.5/M2.7 · MI355X · Precision Comparison

MI355X: FP4 vs FP8 Precision Comparison

How FP4 and FP8 precision affect MiniMax M2.5/M2.7 inference on MI355X (AMD CDNA 4). Throughput, latency, and cost across LLM workloads. Use the chart controls below to switch sequences and metrics — same interactions as the main inference chart.

FP4 posts 5005 tok/s/GPU for $0.08 per million tokens at 42 tok/s/user on MiniMax M2.5/M2.7 (MI355X); FP8 posts 3690 tok/s/GPU for $0.11. FP4 is 36% cheaper per token; FP4 delivers 36% more tok/s/GPU. Quantization-level accuracy differences are tracked on the evaluation tab.

Throughput at 68 tok/s/user on MiniMax M2.5/M2.7 (MI355X): FP4 hits 2238 tok/s/GPU, FP8 hits 1158. Per-million costs land at $0.18 and $0.36 respectively. FP4 is 93% cheaper per token; FP4 delivers 93% more tok/s/GPU. The cost-throughput tradeoff from lower precision is only part of the picture — see the evaluation page for accuracy data.

Toward the upper edge of the 16–119 tok/s/user interactivity band, at 93 tok/s/user on MiniMax M2.5/M2.7 (MI355X): FP4 runs 1214 tok/s/GPU at $0.34/M tokens, FP8 runs 568 at $0.70/M. FP4 is 109% cheaper per token; FP4 delivers 114% more tok/s/GPU. Precision changes affect both inference speed and model quality — consult the evaluation tab for accuracy benchmarks. (Numbers reflect the default 1k/1k selection for this URL — table and chart below update if you change sequence or model in the controls. Each side uses the best available serving configuration for that precision, which may include speculative decoding such as MTP where recipes exist — the same convention as the other comparison pages.)

MiniMax M2.5/M2.7: FP4 versus FP8 throughput and cost at matched interactivity levels on MI355X
FP4 versus FP8 throughput and cost per million tokens on MI355X for this comparison's canonical default workload.
Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.
Metric
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Interactivity (tok/s/user)
Throughput (tok/s/gpu)
FP4:5004.5FP8:3689.9
FP4:2238.4FP8:1158.1
FP4:1214.1FP8:567.7
Cost ($/M tok)
FP4:$0.082FP8:$0.111
FP4:$0.184FP8:$0.357
FP4:$0.336FP8:$0.703
tok/s/MW
FP4:1888500FP8:1392418
FP4:844677FP8:437012
FP4:458133FP8:214237
Concurrency
FP4:~64FP8:~256
FP4:~34FP8:~17
FP4:~12FP8:~6

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Vendor:
Aggregation:
Spec Decoding: