InferenceX

(formerly InferenceMAX)

BySemiAnalysis logo

Open Source Continuous Inference Benchmark trusted by Operators of Trillion Dollar GigaWatt Scale Token Factories

As the world progresses exponentially towards AGI, software development and model releases move at the speed of light. Existing benchmarks rapidly become obsolete due to their static nature, and participants often submit software images purpose-built for the benchmark itself which do not reflect real world performance.

InferenceX™ (formerly InferenceMAX) is our independent, vendor neutral, reproducible benchmark which addresses these issues by continuously benchmarking inference software across a wide range of AI accelerators that are actually available to the ML community.

Our open data & insights are widely adopted by the ML community, capacity planning strategy teams at trillion dollar token factories & AI Labs & at multiple billion dollar NeoClouds. Learn more in our articles: v1, v2.

GPU Specifications

Hardware specifications for GPUs used in InferenceX™ benchmarks, including compute performance, memory bandwidth, and interconnect details.

GPUMemoryMem BWFP4 TFLOP/s1FP8 TFLOP/s1BF16 TFLOP/s1Scale UpScale Up BWWorld SizeScale Up Domain MemoryScale Up Domain Mem BWScale Up TopologyScale Up SwitchScale Out BW per GPUScale Out TechScale Out SwitchScale Out TopologyNIC
H100 SXMNVIDIA
80 GB3.35 TB/s1,979989NVLink 4.0450 GB/s80.64 TB26.8 TB/s7.2Tbit/s NVSwitch Gen 3.0400 Gbit/sRoCEv2 Ethernet25.6T Arista Tomahawk4 7060DX5-64SConnectX-7 2x200GbE
H200 SXMNVIDIA
141 GB4.8 TB/s1,979989NVLink 4.0450 GB/s81.13 TB38.4 TB/s7.2Tbit/s NVSwitch Gen 3.0400 Gbit/sInfiniBand NDR25.6T NVIDIA Quantum-2 QM9790ConnectX-7 400G
B200 SXMNVIDIA
180 GB8 TB/s9,0004,5002,250NVLink 5.0900 GB/s81.44 TB64 TB/s28.8Tbit/s NVSwitch Gen 4.0400 Gbit/sgIB RoCEv2 Ethernet12.8T Whitebox Leaf Tomahawk3 & 25.6T Whitebox Tomahawk4ConnectX-7 400GbE
B300 SXMNVIDIA
268 GB8 TB/s13,5004,5002,250NVLink 5.0900 GB/s82.14 TB64 TB/s28.8Tbit/s NVSwitch Gen 4.0800 Gbit/sRoCEv2 Ethernet51.2T NVIDIA Spectrum-X SN5600ConnectX-8 2x400GbE
GB200 NVL72NVIDIA
192 GB8 TB/s10,0005,0002,500NVLink 5.0900 GB/s7213.82 TB576 TB/s28.8Tbit/s NVSwitch Gen 4.0N/A2N/A2N/A2N/A2N/A2
GB300 NVL72NVIDIA
288 GB8 TB/s15,0005,0002,500NVLink 5.0900 GB/s7220.74 TB576 TB/s28.8Tbit/s NVSwitch Gen 4.0N/A2N/A2N/A2N/A2N/A2
MI300XAMD
192 GB5.3 TB/s2,6151,307Infinity Fabric448 GB/s81.54 TB42.4 TB/s400 Gbit/sRoCEv2 Ethernet51.2T Tomahawk5Pollara 400GbE
MI325XAMD
256 GB6 TB/s2,6151,307Infinity Fabric448 GB/s82.05 TB48 TB/s400 Gbit/sRoCEv2 Ethernet51.2T Tomahawk5Pollara 400GbE
MI355XAMD
288 GB8 TB/s10,0665,0332,5165th Gen Infinity Fabric576 GB/s82.3 TB64 TB/s400 Gbit/sRoCEv2 Ethernet51.2T Arista Tomahawk5 DCS-7060X6-64PEPollara 400GbE

1 Dense tensor core peak TFLOP/s (without sparsity).

2Scale out isn't used in InferenceX™ for rack scale.

Scale-Out Topology Diagrams

Per-server scale-out network topology for each GPU SKU, showing GPU → NIC → leaf switch connectivity.

H100 SXM

8-rail optimized · RoCEv2 Ethernet

H200 SXM

8-rail optimized · InfiniBand NDR

B200 SXM

4-rail optimized · gIB RoCEv2 Ethernet

B300 SXM

8-rail optimized · RoCEv2 Ethernet

MI300X

8-rail optimized · RoCEv2 Ethernet

MI325X

8-rail optimized · RoCEv2 Ethernet

MI355X

8-rail optimized · RoCEv2 Ethernet

Scale-Up Topology Diagrams

Intra-node scale-up interconnect topology for each GPU SKU, showing GPU → NVSwitch or direct GPU-to-GPU connectivity.

H100 SXM

Switched 4-rail Optimized · NVLink 4.0

H200 SXM

Switched 4-rail Optimized · NVLink 4.0

B200 SXM

Switched 2-rail Optimized · NVLink 5.0

B300 SXM

Switched 2-rail Optimized · NVLink 5.0

GB200 NVL72

Switched 18-rail Optimized · NVLink 5.0

GB300 NVL72

Switched 18-rail Optimized · NVLink 5.0

MI300X

Full Mesh · Infinity Fabric

MI325X

Full Mesh · Infinity Fabric

MI355X

Full Mesh · 5th Gen Infinity Fabric