Kimi K2.5/K2.6/K2.7-Code 1T · GPU comparison

Kimi K2.5/K2.6/K2.7-Code 1T — B300 vs H200

Head-to-head AI inference benchmark comparison of B300 (NVIDIA Blackwell) and H200 (NVIDIA Hopper) on Kimi K2.5/K2.6/K2.7-Code 1T. Latency, throughput, and cost across LLM workloads. Use the chart controls below to switch sequences, precisions, and metrics — same interactions as the main inference chart.

At 51 tok/s/user interactivity on Kimi K2.5/K2.6/K2.7-Code 1T, B300 delivers 464 tok/s/GPU at $1.43 per million tokens; H200 delivers 310 tok/s/GPU at $1.24. H200 is 16% cheaper per token; B300 delivers 49% more tok/s/GPU at this point.

B300 posts 304 tok/s/GPU for $2.13 per million tokens at 69 tok/s/user on Kimi K2.5/K2.6/K2.7-Code 1T; H200 posts 229 tok/s/GPU for $1.70. H200 is 25% cheaper per token; B300 delivers 33% more tok/s/GPU.

Throughput at 88 tok/s/user on Kimi K2.5/K2.6/K2.7-Code 1T: B300 hits 211 tok/s/GPU, H200 hits 158. Per-million costs land at $3.02 and $2.49 respectively. H200 is 21% cheaper per token; B300 delivers 33% more tok/s/GPU. (Numbers reflect the default 1k/1k · int4 selection for this URL — table and chart below update if you change sequence, precision, or model in the controls.)

View performance-per-dollar view →

Interpolated from real benchmark data. Edit target interactivity values below to compare at different operating points.

Metric	Interactivity (tok/s/user)	Interactivity (tok/s/user)	Interactivity (tok/s/user)
Throughput (tok/s/gpu)	B300:464.0H200:310.5	B300:304.2H200:229.0	B300:210.7H200:158.1
Cost ($/M tok)	B300:$1.433H200:$1.238	B300:$2.135H200:$1.704	B300:$3.024H200:$2.493
tok/s/MW	B300:213815H200:179479	B300:140189H200:132358	B300:97081H200:91376
Concurrency	B300:~16H200:~25	B300:~12H200:~13	B300:~4H200:~7

Inference Performance

Inference performance metrics across different models, hardware configurations, and serving parameters.

Model

Scenario

Precision

Y-Axis Metric

GPU Config

Quick Filters

Vendor:

Aggregation:

Spec Decoding: