GPU 规格与对比
本页面提供 GPU 规格对比:NVIDIA、AMD 等厂商加速器的显存容量、显存带宽、FLOPS、互连拓扑与功耗规格。
图表中的模型、GPU、框架与指标名称均沿用业界通用英文名称。
GPU 规格
InferenceX™ 基准测试中使用的 GPU 硬件规格,包括计算性能、显存带宽和互联详情。
| GPU | 显存 | 显存带宽 | FP4 TFLOP/s1 | FP8 TFLOP/s1 | BF16 TFLOP/s1 | 纵向扩展 | 纵向扩展带宽 | 域内 GPU 数 | 纵向扩展域显存 | 纵向扩展域显存带宽 | 纵向扩展拓扑 | 纵向扩展交换机 | 每 GPU 横向扩展带宽 | 横向扩展技术 | 横向扩展交换机 | 横向扩展拓扑 | NIC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H100 SXMNVIDIA | 80 GB | 3.35 TB/s | — | 1,979 | 989 | NVLink 4.0 | 450 GB/s | 8 | 0.64 TB | 26.8 TB/s | 7.2Tbit/s NVSwitch Gen 3.0 | 400 Gbit/s | RoCEv2 Ethernet | 25.6T Arista Tomahawk4 7060DX5-64S | ConnectX-7 2x200GbE | ||
H200 SXMNVIDIA | 141 GB | 4.8 TB/s | — | 1,979 | 989 | NVLink 4.0 | 450 GB/s | 8 | 1.13 TB | 38.4 TB/s | 7.2Tbit/s NVSwitch Gen 3.0 | 400 Gbit/s | InfiniBand NDR | 25.6T NVIDIA Quantum-2 QM9790 | ConnectX-7 400G | ||
B200 SXMNVIDIA | 180 GB | 8 TB/s | 9,000 | 4,500 | 2,250 | NVLink 5.0 | 900 GB/s | 8 | 1.44 TB | 64 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | 400 Gbit/s | gIB RoCEv2 Ethernet | 12.8T Whitebox Leaf Tomahawk3 & 25.6T Whitebox Tomahawk4 | ConnectX-7 400GbE | ||
B300 SXMNVIDIA | 268 GB | 8 TB/s | 13,500 | 4,500 | 2,250 | NVLink 5.0 | 900 GB/s | 8 | 2.14 TB | 64 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | 800 Gbit/s | RoCEv2 Ethernet | 51.2T NVIDIA Spectrum-X SN5600 | ConnectX-8 2x400GbE | ||
GB200 NVL72NVIDIA | 192 GB | 8 TB/s | 10,000 | 5,000 | 2,500 | NVLink 5.0 | 900 GB/s | 72 | 13.82 TB | 576 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | N/A2 | N/A2 | N/A2 | N/A2 | N/A2 | |
GB300 NVL72NVIDIA | 288 GB | 8 TB/s | 15,000 | 5,000 | 2,500 | NVLink 5.0 | 900 GB/s | 72 | 20.74 TB | 576 TB/s | 28.8Tbit/s NVSwitch Gen 4.0 | N/A2 | N/A2 | N/A2 | N/A2 | N/A2 | |
MI300XAMD | 192 GB | 5.3 TB/s | — | 2,615 | 1,307 | Infinity Fabric | 448 GB/s | 8 | 1.54 TB | 42.4 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Tomahawk5 | Pollara 400GbE | ||
MI325XAMD | 256 GB | 6 TB/s | — | 2,615 | 1,307 | Infinity Fabric | 448 GB/s | 8 | 2.05 TB | 48 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Tomahawk5 | Pollara 400GbE | ||
MI355XAMD | 288 GB | 8 TB/s | 10,066 | 5,033 | 2,516 | 5th Gen Infinity Fabric | 576 GB/s | 8 | 2.3 TB | 64 TB/s | — | 400 Gbit/s | RoCEv2 Ethernet | 51.2T Arista Tomahawk5 DCS-7060X6-64PE | Pollara 400GbE |
1 密集 Tensor Core 峰值 TFLOP/s(不含稀疏加速)。
2 InferenceX™ 机柜级测试不使用横向扩展。
横向扩展拓扑图
每台服务器的横向扩展网络拓扑,展示 GPU → NIC → Leaf 交换机的连接方式。
H100 SXM
8-rail optimized · RoCEv2 EthernetH200 SXM
8-rail optimized · InfiniBand NDRB200 SXM
4-rail optimized · gIB RoCEv2 EthernetB300 SXM
8-rail optimized · RoCEv2 EthernetMI300X
8-rail optimized · RoCEv2 EthernetMI325X
8-rail optimized · RoCEv2 EthernetMI355X
8-rail optimized · RoCEv2 Ethernet纵向扩展拓扑图
节点内纵向扩展互联拓扑,展示 GPU → NVSwitch 或 GPU 直连方式。