Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Features:Multi-head Latent AttentionAuxiliary-loss-free Load BalancingMulti-Token PredictionSource

Released by DeepSeek on May 28, 2025