
The NVIDIA® Tesla® V100 is the most advanced data center GPU ever built. Housing up to sixteen V100 GPUs per server, the Atipa Altezza G-Series boasts up to 124TFLOPS of double-precision and 2PFLOPS deep learning performance in a single server. Atipa Altezza GPU Series Servers are broadly used to predict weather, discover new drugs, and model nature’s most complex phenomena.

Schedule
Contract #GS-35F-0439P

| |
Intel® oneAPI
A Unified X-Architecture Programming Model
oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures-for faster application performance, more productivity, and greater innovation. The oneAPI industry initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.
Driving a New Era of Accelerated Computing

The oneAPI Specification
-
The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low level hardware interface to support cross-architecture programming. To promote compatibility and enable developer productivity and innovation, the oneAPI specification builds upon industry standards and provides an open, cross-platform developer stack.
Realize All of the Hardware Value
-
Expose and exploit the latest hardware's cutting-edge features to unleash application performance across CPUS, GPUS, FPGAs, and other accelerators.
Develop Performant Code Quickly and Correctly
-
Achieve fast and efficient development with a complete set of cross-architecture libraries and tools that use familiar languages and standards to make heterogeneous development easier.
Intel® oneAPI Toolkits
-
Toolkits contain optimized compilers, libraries, frameworks, and analysis tools purpose-built for developers who perform similar tasks. They include implementations of the oneAPI specification along with complementary tools to develop and deploy applications and solutions across Intel® CPU and XPU architectures.
-
Intel oneAPI Base Toolkit - Develop performant, date-centric applications across Intel CPUs, GPUs, and FPGAs with this foundational toolset.
-
Collective Communications Library
-
Data Analytics Library
-
Deep Neural Networks Library
-
DPC++/C++ Compiler
-
DPC++ Library
-
Math Kernel Library
-
Threading Library
-
Threading Building Blocks
-
VIdeo Processing Library
-
Advisor
-
Distribution for GDB
-
Distribution for Python
-
DPC++ Compatibility Tool
-
FPGA Add-on for oneAPI Base Toolkit
-
Integrated Performance Primitives
-
Vtune Profiler
-
-
Intel oneAPI HPC Toolkit - Build, analyze, and scale applications across shared- and distributed-memory computing systems.
-
DPC++/C++ Compiler
-
C++ Compiler Classic
-
Cluster Check
-
Fortran Compiler (Beta)
-
Fortran Compiler Classic
-
Inspector
-
MPI Library
-
Trace Analyzer and Collector
-
-
Intel oneAPI Analytics Toolkit - Accelerate end-to-end data science and machine learning pipelines using Python tools and frameworks.
-
Distribution for Python including highly-optimized scikit-learn and XGBoost libraries
-
Optimization for PyTorch
-
Optimization for Tensorflow
-
Optimization of Modin
-
Low Precision Optimization
-
Model Zoo for Intel Architecture
-
-
Intel Distribution of OpenVINO toolkit (Powered by oneAPI) - Deploy high-performance inference applications from edge to cloud.
-
Intel oneAPI Rendering Toolkit - Create high-fidelity, photorealistic experiences that push the boundaries of visualization.
-
Intel oneAPI IoT Toolkit - Fast-track development of applications and solutions that run at the network's edge.
-
DPC++/C++ Compiler
-
C++ Compiler Classic
-
Inspector
-
IDE Plugins
-
IoT Connection Tools
-
Linux Kernel Build Tools
-
-
Intel System Bring-up Toolkit - Strengthen system reliability with hardware and software insight, and optimize power and performance.
-



The Language
-
At the core of the oneAPI specification is DPC++, an open, cross-architecture language built upon the ISO C++ and Khronos SYCL standards. DPC++ extends these standards and provides explicit parallel constructs and offload interfaces to support a broad range of computing architectures and processors, including CPUs and accelerator architectures. Other languages and programming models can be supported on the oneAPI platform via the Accelerator Interface.
The Libraries
-
oneAPI provides libraries for compute and data intensive domains. They include deep learning, scientific computing, video analytics, and media processing.
The Hardware Abstraction Layers
-
The low-level hardware interface defines a set of capabilities and services that allow a language runtime to utilize a hardware accelerator.
Advanced Ray Tracing
-
Advanced ray tracing defines a set of ray tracing and high-fidelity rendering and computation routines for use in a wide variety of 3D graphics uses including, film and television photorealistic visual effects and animation rendering, scientific visualization, high-performance computing computations, gaming, and more.
-
Advanced ray tracing is designed to allow cooperative execution on a wide variety of computational devices: CPUs, GPUs, FPGAs, and other accelerators, termed “XPU” computation.
-
The functionality is subdivided into several domains: geometric ray tracing computations, volumetric computation and rendering, path guided ray tracing, image denoising, and an integrated rendering infrastructure and API utilizing all the individual kernel capabilities integrated into a highly capable, easy to use rendering engine.
oneDNN Graph API
-
oneDNN Graph API extends oneDNN with a unified high-level graph API for multiple AI hardware classes (CPU, GPU, accelerators). With a flexible graph interface, it maximizes the optimization opportunity for generating efficient code across a variety of Intel and non-Intel HW, and can be closely integrated with ecosystem framework and inference engines.
-
oneDNN Graph API accepts a deep learning computation graph as input and performs graph partitioning, where nodes that are candidates for fusion are grouped together. oneDNN Graph compiles and executes a group of deep learning operations in a graph partition as a fused operation.
-
With the graph as input, oneDNN Graph implementation can perform target-specific optimization and code generation on a larger scope, which allows it to map the operation to hardware resources and improve execution efficiency and data locality with a global view of the computation graph. With the rapid introduction of hardware support for dense compute, the deep learning workload characteristic changed significantly from a few hot spots on compute-intensive operations to a broad number of operations scattering across the applications.
-
Accelerating a few compute-intensive operations using primitive API has diminishing returns and limits the performance potential. It is critical to have a graph API to better exploit hardware compute capacity.