Intel® oneAPI

Unified X-Architecture Programming Model

Driving a New Era of Accelerated Computing

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures-for faster application performance, more productivity, and greater innovation. The oneAPI industry initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

The oneAPI Speicification

The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low level hardware interface to support cross-architecture programming. To promote compatibility and enable developer productivity and innovation, the oneAPI specification builds upon industry standards and provides an open, cross-platform developer stack.

Intel® oneAPI Toolkits

  • Toolkits contain optimized compilers, libraries, frameworks, and analysis tools purpose-built for developers who perform similar tasks. They include implementations of the oneAPI specification along with complementary tools to develop and deploy applications and solutions across Intel® CPU and XPU architectures.

    • Intel® oneAPI Base Toolkit - Develop performant, date-centric applications across Intel®    CPUs, GPUs, and FPGAs with this foundational toolset.​

      • Collective Communications Library​

      • Data Analytics Library

      • Deep Neural Networks Library

      • DPC++/C++ Compiler

      • DPC++ Library

      • Math Kernel Library

      • Threading Library

      • Threading Building Blocks

      • VIdeo Processing Library

      • Advisor

      • Distribution for GDB

      • Distribution for Python

      • DPC++ Compatibility Tool

      • FPGA Add-on for oneAPI Base Toolkit

      • Integrated Performance Primitives

      • Vtune Profiler

    • Intel® oneAPI HPC Toolkit - Build, analyze, and scale applications across shared- and distributed-memory computing systems.

      • DPC++/C++ Compiler​

      • C++ Compiler Classic

      • Cluster Check

      • Fortran Compiler (Beta)

      • Fortran Compiler Classic

      • Inspector

      • MPI Library

      • Trace Analyzer and Collector

    • Intel® oneAPI Analytics Toolkit - Accelerate end-to-end data science and machine learning pipelines using Python tools and frameworks.

      • Distribution for Python including highly-optimized scikit-learn and XGBoost libraries​

      • Optimization for PyTorch

      • Optimization for Tensorflow

      • Optimization of Modin

      • Low Precision Optimization

      • Model Zoo for Intel® Architecture

    • Intel® Distribution of OpenVINO toolkit (Powered by oneAPI) - Deploy high-performance inference applications from edge to cloud.​​

    • Intel® oneAPI Rendering Toolkit - Create high-fidelity, photorealistic experiences that push the boundaries of visualization.

    • Intel® oneAPI IoT Toolkit - Fast-track development of applications and solutions that run at the network's edge.

      • DPC++/C++ Compiler​

      • C++ Compiler Classic

      • Inspector

      • IDE Plugins

      • IoT Connection Tools

      • Linux Kernel Build Tools

    • Intel® System Bring-up Toolkit - Strengthen system reliability with hardware and software insight, and optimize power and performance.

Applications Graphic.webp
Programming Graphic.jpg

The Language

At the core of the oneAPI specification is DPC++, an open, cross-architecture language built upon the ISO C++ and Khronos SYCL standards. DPC++ extends these standards and provides explicit parallel constructs and offload interfaces to support a broad range of computing architectures and processors, including CPUs and accelerator architectures. Other languages and programming models can be supported on the oneAPI platform via the Accelerator Interface.

Advanced Ray Tracing

Advanced ray tracing defines a set of ray tracing and high-fidelity rendering and computation routines for use in a wide variety of 3D graphics uses including film and television photorealistic visual effects as well as animation rendering, scientific visualization, high-performance computations, gaming and more.

Advanced ray tracing is designed to allow cooperative execution on a wide variety of computational devices: CPUs, GPUs, FPGAs, and other accelerators, termed "XPU" computation.

The functionality is subdivided into several domains: geometric ray tracing computations, volumetric computation and rendering, path guided ray tracing, image denoising, and an integrated rendering infrastructure and API utilizing all the individual kernel capabilities integrated into a highly capable, easy to use rendering engine.

oneDNN Graph API

oneDNN Graph API extends oneDNN with a unified high-level graph API for multiple AI hardware classes (CPU, GPU, accelerators). With a flexible graph interface, it maximizes the optimization opportunity for generating efficient code across a variety of Intel® and non-Intel® HW, and can be closely integrated with ecosystem framework and inference engines.

oneDNN Graph API accepts a deep learning computation graph as input and performs graph partitioning, where nodes that are candidates for fusion are grouped together. oneDNN Graph compiles and executes a group of deep learning operations in a graph partition as a fused operation.

With the graph as input, oneDNN Graph implementation can perform target-specific optimization and code generation on a larger scope, which allows it to map the operation to hardware resources and improve execution efficiency and data locality with a global view of the computation graph. with the rapid introduction of hardware support for dense compute, the deep learning workload characteristic changed significantly from a few hot spots on compute-intensive operations to a broad number of operations scattering across the applications.

Accelerating a few compute-intensive operations using primitive API has diminishing returns and limits the performance potential. It is critical to have a graph API to better exploit hardware compute capacity.