Articles

Insights on AI inference benchmarking, GPU performance, and ML infrastructure.

All amd announcement b200 benchmark deepseek fp4 gb200 gpu inference kimi mi355x nvidia nvl72 rocm sglang vllm wide-ep

May 2, 2026·5 min read

SGLang 0.5.6 on B200 DeepSeek R1 FP4: Up to 1.8x at Low Concurrency

Piecewise CUDA graphs for DeepSeek V3, a unified event loop, and JIT kernels push 8k/1k throughput from 508 to 907 tok/s/GPU on the same 16 GPU B200 pool

benchmarkinferencegpunvidiab200deepseeksglangfp4