NVIDIA® Data Center GPUs
The world's most electrifying accelerators


Hopper Architecture
Exascale High-Performance Computing
The NVIDIA® data center platform consistently delivers performance gains beyond Moore’s Law. And H100’s new breakthrough AI capabilities further amplify the power of HPC+AI to accelerate time to discovery for scientists and researchers working on solving the world’s most important challenges.
H100 triples the floating-point operations per second (FLOPS) of double-precision Tensor Cores, delivering 60 teraFLOPS of FP64 computing for HPC. AI-fused HPC applications can leverage H100’s TF32 precision to achieve one petaFLOP of throughput for single-precision, matrix-multiply operations, with zero code changes.
H100 also features DPX instructions that deliver 7X higher performance over NVIDIA® A100 Tensor Core GPUs and 40X speedups over traditional dual-socket CPU-only servers on dynamic programming algorithms, such as Smith-Waterman for DNA sequence alignment.
Real-Time Deep Learning Inference
AI solves a wide array of business challenges, using an equally wide array of neural networks. A great AI inference accelerator has to not only deliver the highest performance but also the versatility to accelerate these networks.
H100 further extends NVIDIA®’s market-leading inference leadership with several advancements that accelerate inference by up to 30X and deliver the lowest latency. Fourth-generation Tensor Cores speed up all precisions, including FP64, TF32, FP32, FP16, and INT8, and the Transformer Engine utilizes FP8 and FP16 together to reduce memory usage and increase performance while still maintaining accuracy for large language models.

Ampere Architecture
Accelerating the Most Important Work of Our Time
The NVIDIA® A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges. As the engine of the NVIDIA® data center platform, A100 can efficiently scale to thousands of GPUs or, with NVIDIA® Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes. And third-generation Tensor Cores accelerate every precision for diverse workloads, speeding time to insight and time to market.
Deep Learning Training
AI models are exploding in complexity as they take on next-level challenges such as accurate conversational AI and deep recommender systems. Training them requires massive compute power and scalability.
NVIDIA® A100’s third-generation Tensor Cores with Tensor Float (TF32) precision provide up to 20X higher performance over the prior generation with zero code changes and an additional 2X boost with automatic mixed precision and FP16. When combined with third-generation NVIDIA® NVLink®, NVIDIA® NVSwitch™, PCI Gen4, NVIDIA® Mellanox InfiniBand, and the NVIDIA® Magnum IO™ software SDK, it’s possible to scale to thousands of A100 GPUs. This means that large AI models like BERT can be trained in just 37 minutes on a cluster of 1,024 A100s, offering unprecedented performance and scalability. NVIDIA®’s training leadership was demonstrated in MLPerf 0.6, the first industry-wide benchmark for AI training.
A100 introduces groundbreaking new features to optimize inference workloads. It brings unprecedented versatility by accelerating a full range of precisions, from FP32 to FP16 to INT8 and all the way down to INT4. Multi-Instance GPU (MIG) technology allows multiple networks to operate simultaneously on a single A100 GPU for optimal utilization of compute resources. And structural sparsity support delivers up to 2X more performance on top of A100’s other inference performance gains.

A30 High Performance Computing
To unlock next-generation discoveries, scientists use simulations to better understand the world around us.
NVIDIA® A30 features FP64 NVIDIA® Ampere architecture Tensor Cores that deliver the biggest leap in HPC performance since the introduction of GPUs. Combined with 24 gigabytes (GB) of GPU memory with a bandwidth of 933 gigabytes per second (GB/s), researchers can rapidly solve double-precision calculations. HPC applications can also leverage TF32 to achieve higher throughput for single-precision, dense matrix-multiply operations.
The combination of FP64 Tensor Cores and MIG empowers research institutions to securely partition the GPU to allow multiple researchers access to compute resources with guaranteed QoS and maximum GPU utilization. Enterprises deploying AI can use A30’s inference capabilities during peak demand periods and then repurpose the same compute servers for HPC and AI training workloads during off-peak periods.
High-Performance Data Analytics
Data scientists need to be able to analyze, visualize, and turn massive datasets into insights. But scale-out solutions are often bogged down by datasets scattered across multiple servers.
Accelerated servers with A30 provide the needed compute power—along with large HBM2 memory, 933GB/sec of memory bandwidth, and scalability with NVLink—to tackle these workloads. Combined with NVIDIA® InfiniBand, NVIDIA® Magnum IO and the RAPIDS™ suite of open-source libraries, including the RAPIDS Accelerator for Apache Spark, the NVIDIA® data center platform accelerates these huge workloads at unprecedented levels of performance and efficiency.

Form Factor | H100 SXM | H100 PCIe | A100 SXM | A100 PCIe | A30 PCIe |
---|---|---|---|---|---|
Server Options | NVIDIA HGX™ H100 Partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs NVIDIA DGX™ H100 with 8 GPUs | Partner and NVIDIA-Certified Systems with 1–8 GPUs | Partner and NVIDIA-Certified Systems™ with 1-8 GPUs | NVIDIA HGX™ A100-Partner and NVIDIA-Certified Systems with 4,8, or 16 GPUs NVIDIA DGX™ A100 with 8 GPUs | |
Interconnect | NVLink: 900GB/s PCIe Gen5: 128GB/s | NVLINK: 600GB/s PCIe Gen5: 128GB/s | NVIDIA® NVLink® Bridge
for 2 GPUs: 600 GB/s
PCIe Gen4: 64 GB/s | NVLink: 600 GB/s
PCIe Gen4: 64 GB/s | |
Multi-Instance GPUs | Up to 7 MIGS @ 10GB each | Up to 7 MIGS @ 10GB each | Up to 7 MIGS @ 10GB each | Up to 7 MIGS @ 10GB each | 4 GPU instances @ 6GB each
2 GPU instances @ 12GB each
1 GPU instance @ 24GB |
Max Thermal Design Power (TDP) | 700W | 350W | 300W | 400W | 165W |
Decoders | 7 NVDEC 7JPEG | 7 NVDEC 7JPEG | |||
GPU Memory Bandwidth | 3TB/s | 3TB/s | 1,935 GB/s | 2,039 GB/s | 933GB/s |
GPU Memory | 80GB | 80GB | 80GB | 80GB | 24GB |
INT8 Tensor Core | 4,000 teraFLOPS | 3,200 teraFLOPS | 624 teraFLOPS | 624 teraFLOPS | 330 teraFLOPS |
FP8 Tensor Core | 4,000 teraFLOPS | 3,200 teraFLOPS | 624 teraFLOPS | 624 teraFLOPS | 330 teraFLOPS |
FP16 Tensor Core | 2,000 teraFLOPS | 3,200 teraFLOPS | 312 teraFLOPS | 312 teraFLOPS | 165 teraFLOPS |
BFLOAT16 Tensor Core | 2,000 teraFLOPS | 1,600 teraFLOPS | 312 teraFLOPS | 312 teraFLOPS | 165 teraFLOPS |
TF32 Tensor Core | 1,000 teraFLOPS | 800 teraFLOPS | 156 teraFLOPS | 156 teraFLOPS | 82 teraFLOPS |
FP32 | 60 teraFLOPS | 48 teraFLOPS | 19.5 teraFLOPS | 19.5 teraFLOPS | 10.3 teraFLOPS |
FP64 Tensor Core | 60 teraFLOPS | 48 teraFLOPS | 19.5 teraFLOPS | 19.5 teraFLOPS | 10.3 teraFLOPS |
FP64 | 30 teraFLOPS | 24 teraFLOPS | 9.7 teraFLOPS | 9.7 teraFLOPS | 5.2 teraFLOPS |