GPU Specifications
Hardware specifications for GPUs used in InferenceX™ benchmarks, including compute performance, memory bandwidth, and interconnect details.
| GPU | Memory | Mem BW | FP4 TFLOP/s1 | FP8 TFLOP/s1 | BF16 TFLOP/s1 | Scale Up | Scale Up BW | World Size | Scale Up Domain Memory | Scale Up Domain Mem BW | Scale Up Topology | Scale Up Switch | Scale Out BW per GPU | Scale Out Tech | Scale Out Switch | Scale Out Topology | NIC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H100 SXMNVIDIA | 80 GB | 3.35 TB/s | — | 1,979 | 989 | NVLink 4.0 | 450 GB/s | 8 | 0.64 TB | 26.8 TB/s | 7.2Tbit/s NVSwitch Gen 3.0 | 400 Gbit/s | RoCEv2 Ethernet | 25.6T Arista Tomahawk4 7060DX5-64S | ConnectX-7 2x200GbE | ||
H200 SXMNVIDIA | 141 GB | 4.8 TB/s | — | 1,979 | 989 | NVLink 4.0 | 450 GB/s | 8 | 1.13 TB | 38.4 TB/s | 7.2Tbit/s NVSwitch Gen 3.0 | 400 Gbit/s | InfiniBand NDR | 25.6T NVIDIA Quantum-2 QM9790 | ConnectX-7 400G | ||
B200 SXMNVIDIA | 180 GB | 8 TB/s | 9,000 | 4,500 | 2,250 | NVLink 5.0 | 900 GB/s | 8 | 1.44 TB | 64 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | 400 Gbit/s | gIB RoCEv2 Ethernet | 12.8T Whitebox Leaf Tomahawk3 & 25.6T Whitebox Tomahawk4 | ConnectX-7 400GbE | ||
B300 SXMNVIDIA | 268 GB | 8 TB/s | 13,500 | 4,500 | 2,250 | NVLink 5.0 | 900 GB/s | 8 | 2.14 TB | 64 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | 800 Gbit/s | RoCEv2 Ethernet | 51.2T NVIDIA Spectrum-X SN5600 | ConnectX-8 2x400GbE | ||
GB200 NVL72NVIDIA | 192 GB | 8 TB/s | 10,000 | 5,000 | 2,500 | NVLink 5.0 | 900 GB/s | 72 | 13.82 TB | 576 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | N/A2 | N/A2 | N/A2 | N/A2 | N/A2 | |
GB300 NVL72NVIDIA | 288 GB | 8 TB/s | 15,000 | 5,000 | 2,500 | NVLink 5.0 | 900 GB/s | 72 | 20.74 TB | 576 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | N/A2 | N/A2 | N/A2 | N/A2 | N/A2 | |
MI300XAMD | 192 GB | 5.3 TB/s | — | 2,615 | 1,307 | Infinity Fabric | 448 GB/s | 8 | 1.54 TB | 42.4 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Tomahawk5 | Pollara 400GbE | ||
MI325XAMD | 256 GB | 6 TB/s | — | 2,615 | 1,307 | Infinity Fabric | 448 GB/s | 8 | 2.05 TB | 48 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Tomahawk5 | Pollara 400GbE | ||
MI355XAMD | 288 GB | 8 TB/s | 10,066 | 5,033 | 2,516 | 5th Gen Infinity Fabric | 576 GB/s | 8 | 2.3 TB | 64 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Arista Tomahawk5 DCS-7060X6-64PE | Pollara 400GbE |
1 Dense tensor core peak TFLOP/s (without sparsity).
2Scale out isn't used in InferenceX™ for rack scale.
Scale-Out Topology Diagrams
Per-server scale-out network topology for each GPU SKU, showing GPU → NIC → leaf switch connectivity.
H100 SXM
8-rail optimized · RoCEv2 EthernetH200 SXM
8-rail optimized · InfiniBand NDRB200 SXM
4-rail optimized · gIB RoCEv2 EthernetB300 SXM
8-rail optimized · RoCEv2 EthernetMI300X
8-rail optimized · RoCEv2 EthernetMI325X
8-rail optimized · RoCEv2 EthernetMI355X
8-rail optimized · RoCEv2 EthernetScale-Up Topology Diagrams
Intra-node scale-up interconnect topology for each GPU SKU, showing GPU → NVSwitch or direct GPU-to-GPU connectivity.