GPU Architecture NVIDIA Ampere
NVIDIA Tensor Cores 512 third-generation Tensor Cores per GPU
NVIDIA CUDA Cores 8192 FP32 CUDA Cores per GPU
Double-Precision Performance FP64: 9.7 TFLOPS FP64 Tensor Core: 19.5 TFLOPS
Single-Precision Performance FP32: 19.5 TFLOPS Tensor Float 32 (TF32): 156 TFLOPS, 312 TFLOPS*
Half-Precision Performance 312 TFLOPS, 624 TFLOPS*
Bfloat16 312 TFLOPS, 624 TFLOPS*
Integer Performance INT8 624 TOPS, 1,248 TOPS* INT4: 1,248 TOPS, 2,496 TOPS*
GPU Memory 80 GB HBM2
Memory Bandwidth 1,935 GB/s
ECC Yes
Interconnect Bandwidth NVLink: 400 GB/s PCIe: 64 GB/s
System Interface PCIe Gen 4, x16 lanes
Form Factor PCIe full height/length, double width
Multi-Instance GPU (MIG) Up to 7 GPU instances, 10GB each
Max Power Consumption 300 W
Thermal Solution Passive
Compute APIs CUDA, DirectCompute, OpenCL, OpenACC
* With structural sparsity enabled
Reviews
There are no reviews yet.