Open Source Continuous Inference Benchmark trusted by Operators of Trillion Dollar GigaWatt Scale Token Factories
As the world progresses exponentially towards AGI, software development and model releases move at the speed of light. Existing benchmarks rapidly become obsolete due to their static nature, and participants often submit software images purpose-built for the benchmark itself which do not reflect real world performance.
InferenceX™ (formerly InferenceMAX) is our independent, vendor neutral, reproducible benchmark which addresses these issues by continuously benchmarking inference software across a wide range of AI accelerators that are actually available to the ML community.
Our open data & insights are widely adopted by the ML community, capacity planning strategy teams at trillion dollar token factories & AI Labs & at multiple billion dollar NeoClouds. Learn more in our articles: v1, v2.
GPU Specifications
Hardware specifications for GPUs used in InferenceX™ benchmarks, including compute performance, memory bandwidth, and interconnect details.
| GPU | Memory | Mem BW | FP4 TFLOP/s1 | FP8 TFLOP/s1 | BF16 TFLOP/s1 | Scale Up | Scale Up BW | World Size | Scale Up Domain Memory | Scale Up Domain Mem BW | Scale Up Topology | Scale Up Switch | Scale Out BW per GPU | Scale Out Tech | Scale Out Switch | Scale Out Topology | NIC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H100 SXMNVIDIA | 80 GB | 3.35 TB/s | — | 1,979 | 989 | NVLink 4.0 | 450 GB/s | 8 | 0.64 TB | 26.8 TB/s | 7.2Tbit/s NVSwitch Gen 3.0 | 400 Gbit/s | RoCEv2 Ethernet | 25.6T Arista Tomahawk4 7060DX5-64S | ConnectX-7 2x200GbE | ||
H200 SXMNVIDIA | 141 GB | 4.8 TB/s | — | 1,979 | 989 | NVLink 4.0 | 450 GB/s | 8 | 1.13 TB | 38.4 TB/s | 7.2Tbit/s NVSwitch Gen 3.0 | 400 Gbit/s | InfiniBand NDR | 25.6T NVIDIA Quantum-2 QM9790 | ConnectX-7 400G | ||
B200 SXMNVIDIA | 180 GB | 8 TB/s | 9,000 | 4,500 | 2,250 | NVLink 5.0 | 900 GB/s | 8 | 1.44 TB | 64 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | 400 Gbit/s | gIB RoCEv2 Ethernet | 12.8T Whitebox Leaf Tomahawk3 & 25.6T Whitebox Tomahawk4 | ConnectX-7 400GbE | ||
B300 SXMNVIDIA | 268 GB | 8 TB/s | 13,500 | 4,500 | 2,250 | NVLink 5.0 | 900 GB/s | 8 | 2.14 TB | 64 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | 800 Gbit/s | RoCEv2 Ethernet | 51.2T NVIDIA Spectrum-X SN5600 | ConnectX-8 2x400GbE | ||
GB200 NVL72NVIDIA | 192 GB | 8 TB/s | 10,000 | 5,000 | 2,500 | NVLink 5.0 | 900 GB/s | 72 | 13.82 TB | 576 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | N/A2 | N/A2 | N/A2 | N/A2 | N/A2 | |
GB300 NVL72NVIDIA | 288 GB | 8 TB/s | 15,000 | 5,000 | 2,500 | NVLink 5.0 | 900 GB/s | 72 | 20.74 TB | 576 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | N/A2 | N/A2 | N/A2 | N/A2 | N/A2 | |
MI300XAMD | 192 GB | 5.3 TB/s | — | 2,615 | 1,307 | Infinity Fabric | 448 GB/s | 8 | 1.54 TB | 42.4 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Tomahawk5 | Pollara 400GbE | ||
MI325XAMD | 256 GB | 6 TB/s | — | 2,615 | 1,307 | Infinity Fabric | 448 GB/s | 8 | 2.05 TB | 48 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Tomahawk5 | Pollara 400GbE | ||
MI355XAMD | 288 GB | 8 TB/s | 10,066 | 5,033 | 2,516 | 5th Gen Infinity Fabric | 576 GB/s | 8 | 2.3 TB | 64 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Arista Tomahawk5 DCS-7060X6-64PE | Pollara 400GbE |
1 Dense tensor core peak TFLOP/s (without sparsity).
2Scale out isn't used in InferenceX™ for rack scale.
Scale-Out Topology Diagrams
Per-server scale-out network topology for each GPU SKU, showing GPU → NIC → leaf switch connectivity.
H100 SXM
8-rail optimized · RoCEv2 EthernetH200 SXM
8-rail optimized · InfiniBand NDRB200 SXM
4-rail optimized · gIB RoCEv2 EthernetB300 SXM
8-rail optimized · RoCEv2 EthernetMI300X
8-rail optimized · RoCEv2 EthernetMI325X
8-rail optimized · RoCEv2 EthernetMI355X
8-rail optimized · RoCEv2 EthernetScale-Up Topology Diagrams
Intra-node scale-up interconnect topology for each GPU SKU, showing GPU → NVSwitch or direct GPU-to-GPU connectivity.
