Intel® oneAPI
A Unified X-Architecture Programming Model

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures-for faster application performance, more productivity, and greater innovation. The oneAPI industry initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

Driving a New Era of Accelerated Computing
The oneAPI Specification
  • The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low level hardware interface to support cross-architecture programming. To promote compatibility and enable developer productivity and innovation, the oneAPI specification builds upon industry standards and provides an open, cross-platform developer stack.

Realize All of the Hardware Value
  • Expose and exploit the latest hardware's cutting-edge features to unleash application performance across CPUS, GPUS, FPGAs, and other accelerators.

Develop Performant Code Quickly and Correctly
  • Achieve fast and efficient development with a complete set of cross-architecture libraries and tools that use familiar languages and standards to make heterogeneous development easier.

Intel® oneAPI Toolkits
  • Toolkits contain optimized compilers, libraries, frameworks, and analysis tools purpose-built for developers who perform similar tasks. They include implementations of the oneAPI specification along with complementary tools to develop and deploy applications and solutions across Intel® CPU and XPU architectures.

    • Intel oneAPI Base Toolkit - Develop performant, date-centric applications across Intel CPUs, GPUs, and FPGAs with this foundational toolset.​

      • Collective Communications Library​

      • Data Analytics Library

      • Deep Neural Networks Library

      • DPC++/C++ Compiler

      • DPC++ Library

      • Math Kernel Library

      • Threading Library

      • Threading Building Blocks

      • VIdeo Processing Library

      • Advisor

      • Distribution for GDB

      • Distribution for Python

      • DPC++ Compatibility Tool

      • FPGA Add-on for oneAPI Base Toolkit

      • Integrated Performance Primitives

      • Vtune Profiler

    • Intel oneAPI HPC Toolkit - Build, analyze, and scale applications across shared- and distributed-memory computing systems.

      • DPC++/C++ Compiler​

      • C++ Compiler Classic

      • Cluster Check

      • Fortran Compiler (Beta)

      • Fortran Compiler Classic

      • Inspector

      • MPI Library

      • Trace Analyzer and Collector

    • Intel oneAPI Analytics Toolkit - Accelerate end-to-end data science and machine learning pipelines using Python tools and frameworks.

      • Distribution for Python including highly-optimized scikit-learn and XGBoost libraries​

      • Optimization for PyTorch

      • Optimization for Tensorflow

      • Optimization of Modin

      • Low Precision Optimization

      • Model Zoo for Intel Architecture

    • Intel Distribution of OpenVINO toolkit (Powered by oneAPI) - Deploy high-performance inference applications from edge to cloud.​​

    • Intel oneAPI Rendering Toolkit - Create high-fidelity, photorealistic experiences that push the boundaries of visualization.

    • Intel oneAPI IoT Toolkit - Fast-track development of applications and solutions that run at the network's edge.

      • DPC++/C++ Compiler​

      • C++ Compiler Classic

      • Inspector

      • IDE Plugins

      • IoT Connection Tools

      • Linux Kernel Build Tools

    • Intel System Bring-up Toolkit - Strengthen system reliability with hardware and software insight, and optimize power and performance.

The Language
  • At the core of the oneAPI specification is DPC++, an open, cross-architecture language built upon the ISO C++ and Khronos SYCL standards. DPC++ extends these standards and provides explicit parallel constructs and offload interfaces to support a broad range of computing architectures and processors, including CPUs and accelerator architectures. Other languages and programming models can be supported on the oneAPI platform via the Accelerator Interface.

The Libraries
  • oneAPI provides libraries for compute and data intensive domains. They include deep learning, scientific computing, video analytics, and media processing.

The Hardware Abstraction Layers
  • The low-level hardware interface defines a set of capabilities and services that allow a language runtime to utilize a hardware accelerator.

Advanced Ray Tracing
  • Advanced ray tracing defines a set of ray tracing and high-fidelity rendering and computation routines for use in a wide variety of 3D graphics uses including, film and television photorealistic visual effects and animation rendering, scientific visualization, high-performance computing computations, gaming, and more. 

  • Advanced ray tracing is designed to allow cooperative execution on a wide variety of computational devices: CPUs, GPUs, FPGAs, and other accelerators, termed “XPU” computation.

  • The functionality is subdivided into several domains: geometric ray tracing computations, volumetric computation and rendering, path guided ray tracing, image denoising, and an integrated rendering infrastructure and API utilizing all the individual kernel capabilities integrated into a highly capable, easy to use rendering engine.

oneDNN Graph API
  • oneDNN Graph API extends oneDNN with a unified high-level graph API for multiple AI hardware classes (CPU, GPU, accelerators). With a flexible graph interface, it maximizes the optimization opportunity for generating efficient code across a variety of Intel and non-Intel HW, and can be closely integrated with ecosystem framework and inference engines.

  • oneDNN Graph API accepts a deep learning computation graph as input and performs graph partitioning, where nodes that are candidates for fusion are grouped together. oneDNN Graph compiles and executes a group of deep learning operations in a graph partition as a fused operation.

  • With the graph as input, oneDNN Graph implementation can perform target-specific optimization and code generation on a larger scope, which allows it to map the operation to hardware resources and improve execution efficiency and data locality with a global view of the computation graph. With the rapid introduction of hardware support for dense compute, the deep learning workload characteristic changed significantly from a few hot spots on compute-intensive operations to a broad number of operations scattering across the applications.

  • Accelerating a few compute-intensive operations using primitive API has diminishing returns and limits the performance potential. It is critical to have a graph API to better exploit hardware compute capacity.