Questions?

We are here to help. Contact us by phone, email or stop in!

(888) 222-7822
sales@atipa.com
4921 Legends Drive, Lawrence, KS 66049

Copyright© 2001-2018 Atipa Technologies All Rights Reserved.

Atipa Technologies, a division of Microtech Computers, Inc., is not responsible for typographical or photographic errors. Designated trademarks and brands are the property of their respective owners.

Intel   Omni-Path

The Next-Generation Fabric

®

Intel   Omni-Path Architecture (Intel   OPA), an element of Intel   Scalable System Framework, delivers the performance for tomorrow’s high performance computing (HPC) workloads and the ability to scale to tens of thousands of nodes—and eventually more—at a price competitive with today’s fabrics. The Intel   OPA 100 Series product line is an end-to-end solution of PCIe* adapters, silicon, switches, cables, and management software. As the successor to Intel   True Scale Fabric, this optimized HPC fabric is built upon a combination of enhanced IP and Intel   technology.

®

®

®

®

®

The Future of High Performance Fabrics

Current standards-based high performance fabrics were not originally designed for HPC, resulting in performance and scaling weaknesses that are currently impeding the path to Exascale computing. Intel   Omni-Path Architecture is being designed specifically to address these issues and scale cost-effectively from entry level HPC clusters to larger clusters with 10,000 nodes or more.

Intel OPA is designed to provide the:

  • Features and functionality at both the host and fabric levels to greatly raise levels of scaling

  • CPU and fabric integration necessary for the increased computing density, improved reliability, reduced power, and lower costs required by significantly larger HPC deployments

  • Fabric tools to readily install, verify, and manage fabrics at this level of complexity

®

Intel   Omni-Path Key Fabric Features and Innovations

®

Adaptive Routing

Adaptive Routing monitors the performance of the possible paths between fabric end-points and selects the least congested path to rebalance the packet load. While other technologies also support routing, the implementation is vital. Intel’s implementation is based on cooperation between the Fabric Manager and the switch ASICs. The Fabric Manager—with a global view of the topology—initializes the switch ASICs with several egress options per destination, updating these options as the fundamental fabric changes when links are added or removed. Once the switch egress options are set, the Fabric Manager monitors the fabric state, and the switch ASICs dynamically monitor and react to the congestion sensed on individual links. This approach enables Adaptive Routing to scale as fabrics grow larger and more complex.

Dispersive Routing

One of the critical roles of fabric management is the initialization and configuration of routes through the fabric between pairs of nodes. Intel   Omni-Path Fabric supports a variety of routing methods, including defining alternate routes that disperse traffic flows for redundancy, performance, and load balancing. Instead of sending all packets from a source to a destination via a single path, Dispersive Routing distributes traffic across multiple paths. Once received, packets are reassembled in their proper order for rapid, efficient processing. By leveraging more of the fabric to deliver maximum communications performance for all jobs, Dispersive Routing promotes optimal fabric efficiency.

®

Traffic Flow Optimization

Traffic Flow Optimization optimizes the quality of service beyond selecting the priority—based on virtual lane or service level—of messages to be sent on an egress port. At the Intel   Omni-Path Architecture link level, variable length packets are broken up into fixed-sized containers that are in turn packaged into fixed-sized Link Transfer Packets (LTPs) for transmitting over the link. Since packets are broken up into smaller containers, a higher priority container can request a pause and be inserted into the ISL data stream before completing the previous data.

The key benefit is that Traffic Flow Optimization reduces the variation in latency seen through the network by high priority traffic in the presence of lower priority traffic. It addresses a traditional weakness of similar technologies in which a packet must be transmitted to completion once the link starts even if higher priority packets become available.

®

Packet Integrity Protection

Packet Integrity Protection allows for rapid and transparent recovery of transmission errors between a sender and a receiver on an Intel   Omni-Path Architecture link. Given the very high Intel   OPA signaling rate (25.78125G per lane) and the goal of supporting large scale systems of a hundred thousand or more links, transient bit errors must be tolerated while ensuring that the performance impact is insignificant. Packet Integrity Protection enables recovery of transient errors whether it is between a host and switch or between switches. This eliminates the need for transport level timeouts and end-to-end retries. This is done without the heavy latency penalty associated with alternate error recovery approaches.

®

®

Dynamic Lane Scaling

Dynamic Lane Scaling allows an operation to continue even if one or more lanes of a 4x link fail, saving the need to restart or go to a previous checkpoint to keep the application running. The job can then run to completion before taking action to resolve the issue.