®

NVIDIA  A100 TENSOR CORE GPU

The World's Most Electrifying Accelerator

Accelerating the Most Important Work of Our Time

The NVIDIA® A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale to thousands of GPUs or, with NVIDIA® Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes. And third-generation Tensor Cores accelerate every precision for diverse workloads, speeding time to insight and time to market.

The Most Powerful End-to-End AI and HPC Data Center Platform

Deep Learning Training

 

AI models are exploding in complexity as they take on next-level challenges such as accurate conversational AI and deep recommender systems. Training them requires massive compute power and scalability.

NVIDIA® A100’s third-generation Tensor Cores with Tensor Float (TF32) precision provide up to 20X higher performance over the prior generation with zero code changes and an additional 2X boost with automatic mixed precision and FP16. When combined with third-generation NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA® Mellanox InfiniBand, and the NVIDIA Magnum IO™ software SDK, it’s possible to scale to thousands of A100 GPUs. This means that large AI models like BERT can be trained in just 37 minutes on a cluster of 1,024 A100s, offering unprecedented performance and scalability.

NVIDIA’s® training leadership was demonstrated in MLPerf 0.6, the first industry-wide benchmark for AI training.

®

A100 is part of the complete NVIDIA   data center solution that incorporates building blocks across hardware, networking, software, libraries, and optimized AI models and applications from NGC™. Representing the most powerful end-to-end AI and HPC platform for data centers, it allows researchers to deliver real-world results and deploy solutions into production at scale.

®

Up to 6X Higher Out-of-the-Box

Performance ​with TF32 for AI Training

BERT Training

Deep Learning Inference

Up to 7X Higher Performance with Multi-Instance GPU (MIG) for AI Inference

BERT Large Inference

BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128,

Phase 2 Seq Len = 512; V100: NVIDIA DGX-1™ server with 8x V100 using FP32 precision; A100: DGX A100

Server with 8x A100 using TF32 precision.

 

A100 introduces groundbreaking new features to optimize inference workloads. It brings unprecedented versatility by accelerating a full range of precisions, from FP32 to FP16 to INT8 and all the way down to INT4. Multi-Instance GPU (MIG) technology allows multiple networks to operate simultaneously on a single A100 GPU for optimal utilization of compute resources. And structural sparsity support delivers up to 2X more performance on top of A100’s other inference performance gains.

NVIDIA® already delivers market-leading inference performance, as demonstrated in an across-the-board sweep of MLPerf Inference 0.5, the first industry-wide benchmark for inference. A100 brings 20X more performance to further extend that leadership.

BERT Large Inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRT™ (TRT) 7.1, precision = INT8,

batch size = 256 |V100: TRT 7.1, precision = FP16, batch size = 256 | A100 with 7 MIG instances of 1g.5gb:

pre-production TRT, batchsize = 94, precision = INT8 with sparsity.

High-Performance Computing

 

To unlock next-generation discoveries, scientists look to simulations to better understand complex molecules for drug discovery, physics for potential new sources of energy, and atmospheric data to better predict and prepare for extreme weather patterns.

A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs for HPC. This enables researchers to reduce a 10-hour, double-precision simulation running on NVIDIA® V100 Tensor Core GPUs to just four hours on A100. HPC applications can also leverage TF32 precision in A100’s Tensor Cores to achieve up to 10X higher throughput for single-precision dense matrix multiply operations.

9X More HPC Performance in 4 Years

Throughput for Top HPC Apps

Geometric mean of application speedups vs. P100: benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], PyTorch

(BERT Large Fine Tuner], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64 : 10)], TensorFlow [ResNet-50], VASP 6 [Si Huge], | GPU node with dual-socket CPUs with 4x NVIDIA P100, V100, or A100 GPUs.​

High-Performance Data Analytics

 

Customers need to be able to analyze, visualize, and turn massive datasets into insights. But scale-out solutions often become bogged down as these datasets are scattered across multiple servers.

Accelerated servers with A100 deliver the needed compute power—along with 1.6 terabytes per second (TB/sec) of memory bandwidth and scalability with third-generation NVLink and NVSwitch—to tackle these massive workloads. Combined with NVIDIA® Mellanox InfiniBand, the Magnum IO SDK, and RAPIDS suite of open source software libraries, including the RAPIDS Accelerator for Apache Spark for GPU-accelerated data analytics, the NVIDIA® data center platform is uniquely able to accelerate these huge workloads at unprecedented levels of performance and efficiency.

Enterprise-Ready Utilization

A100 with MIG maximizes the utilization of GPU-accelerated infrastructure like never before. MIG allows an A100 GPU to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration for their applications and development projects. MIG works with Kubernetes, containers, and hypervisor-based server virtualization with NVIDIA® Virtual Compute Server (vComputeServer). MIG lets infrastructure managers offer a right-sized GPU with guaranteed quality of service (QoS) for every job, optimizing utilization and extending the reach of accelerated computing resources to every user.

7X Higher Inference Throughput with

Multi-Instance GPU (MIG)

BERT Large Inference

BERT Large Inference | NVIDIA TensorRT™ (TRT) 7.1 | NVIDIA T4 Tensor Core GPU: TRT 7.1,

precision = INT8, batch size = 256 | V100: TRT 7.1, precision = FP16, batch size = 256 |

A100 with 1 or 7 MIG instances of 1g.5gb: batch size = 94, precision = INT8 with sparsity

Form Factors

System Specifications (Peak Performance)

NVIDIA  A100 FOR NVIDIA HGX™

®

NVIDIA  A100 FOR PCle

®

Featured PCIe Servers

  • Processor: 1 x AMD Epyc 7002 Processor

  • GPUs: 4

  • Supported GPUS:

    • Nvidia® A100-PCIe/Tesla V100-PCIe

  • Processor: 1 x AMD Epyc 7002 Processor

  • GPUs: 8

  • Supported GPUS:

    • Nvidia® A100-PCIe/Tesla V100-PCIe

Featured SXM4 Servers

  • Processor: 2 x AMD Epyc 7002 Processor

  • GPUs: 4

  • Supported GPUS:

    • Nvidia® A100-SXM4

  • Processor: 2 x AMD Epyc 7002 Processor​​

  • GPUs: 8

  • Supported GPUS:

    • Nvidia® A100-SXM4

High Performance Computing
and Storage Solutions
Intel Technology Provider Partner Of The Year
Top 500 and Green 500 Supercomputer Winner
Contact Us:
Atipa Technologies
4921 Legends Drive,
Lawrence, KS 66049
Office Phone: (785) 841-9559
Office Fax: (785) 841-1809
Customer Service E-Mail: cs@atipa.com
Sales Toll Free Number: (888) 222-7822
Sales E-Mail: sales@atipa.com
Connect with ATIPA:
  • Facebook
  • Twitter
  • LinkedIn

Copyright© 2001-2020 Atipa Technologies All Rights Reserved.

Atipa Technologies, a division of Microtech Computers, Inc., is not responsible for typographical or photographic errors.
Designated trademarks and brands are the property of their respective owners.