Artificial Intelligence

NVIDIA Networking Solutions

Artificial Intelligence Needs the Most Intelligent Interconnect

InfiniBand Enables the Most Efficient Machine Learning Platforms

Machine learning is a pillar of today’s technological world, offering solutions that enable better and more accurate decision-making based on the great amounts of data being collected. Machine learning encompasses a wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars, healthcare and smart cities.

InfiniBand accelerates all popular frameworks such as TensorFlow, CNTK, Paddle, Pytorch and Apache Spark with RDMA, and continues to innovate and accelerate solutions for fastest and most scalable distributed execution of training large and powerful models.

By providing low latency, high bandwidth, high message rate, and smart offloads, InfiniBand solutions are the most deployed high-speed interconnect for large-scale machine learning - for both training and inferencing systems.

THE SELENE SUPERCOMPUTER

NVIDIA’s DGX SuperPOD with NVIDIA Mellanox HDR 200Gb/s InfiniBand Deployment

One of the fastest and most efficient supercomputers on the planet built in under one month

Maximizing Data Center Storage and Network IO Performance with NVIDIA Magnum IO

Magnum IO utilizes storage IO, network IO, in-network compute, and IO management to simplify and speed up data movement, access, and management for multi-GPU, multi-node systems.

Magnum IO supports NVIDIA CUDA-X™ libraries and makes the best use of a range of NVIDIA GPU and NVIDIA networking hardware topologies to achieve optimal throughput and low latency.

Setting A New Bar IN MLPERF

NVIDIA training and inference solutions deliver record-setting performance in MLPerf, the leading industry benchmark for AI performance.

SHARP_v2 Delivers Highest Performance for AI

An autoencoder is a neural network that learns to copy its input to its output. It has an internal (hidden) layer that describes a code used to represent the input, and it is constituted by two main parts: an encoder that maps the input into the code, and a decoder that maps the code to a reconstruction of the original input.

Variational autoencoders, use a variational approach for latent representation learning and allow for the design of complex generative models of data, and fit them to large datasets. Some interesting examples include the generation of images of fictional faces, perhaps for on-line customer service or professors in an online educational institute, or event high-resolution digital artwork.

Mellanox SHARP_v2 technology improves the scalable performance of VAE significantly.

SHARP_v2 Performance Advantage

Many parallel applications will require accessing the reduced results across all processes rather than the root process. In a similar complementary style of MPI_Allgather to MPI_Gather, MPI_Allreduce will reduce the values and distribute the results to all processes.

Latency as well as network bandwidth is important for many AI workloads including NLP. InfiniBand with Mellanox SHARP technology dramatically reduces latency for messages that are typically larger in size. Even RDMA over Converged Ethernet is no match for the in-network computing capabilities of SHARP.

SHARP_v2 Delivers Highest Performance for AI

Google Neural Machine Translation (GNMT) is a neural machine translation (NMT) system developed by Google that uses an artificial neural network to increase fluency and accuracy in Google Translate.

GNMT improves on the quality of translation by applying an example-based (EBMT) machine translation method in which the system "learns from millions of examples“

The tightly coupled integration between the new NCCL® + SHARP plugin from HPC-X™ demonstrates exceptional performance for GNMT

10X Performance with NVIDIA Networking GPUDirect RDMA

Designed specifically for the needs of GPU acceleration, GPUDirect^® RDMA provides direct communication between NVIDIA GPUs in remote systems. This eliminates the system CPUs and required buffer copies of data via the system memory, resulting in 10X better performance.

GPU Direct

Resources

Videos Resources Solution Briefs Product Briefs Whitepapers

Videos

NVIDIA Networking Solutions Unified Fabric Management

Maximizing Performance for Distributed Machine and Deep Learning with SHARP

Videos

Magnum IO

NVIDIA Mellanox Academy

MLPerf Page

Deep Learning Performance Page

DGX AI Leadership Page

NVIDIA Developer Resources

Solution Briefs

Data Center Performance For In-Vehicle Applications

How-to: Reference Deployment Guides

DGX SuperPOD Reference Architecture

Accelerate Your Business with Deep Learning

Product Briefs

NVIDIA Mellanox Product Documentation

NVIDIA Mellanox UFM Platform

NVIDIA Mellanox MetroX-2 Systems

QM8700 Mellanox Quantum™ HDR Edge Switch

QM8790 Mellanox Quantum™ HDR Edge Switch

CS8500 Mellanox Quantum™ HDR Modular Switch

ConnectX-6 HDR 200Gb/s VPI Adapter

LinkX® InfiniBand Cables and Transceivers

BlueField-2

BlueField

Innova SmartNIC

Whitepapers

Introducing 200G HDR InfiniBand Solutions

NVIDIA Mellanox In-Network Computing and Next Generation HDR200G InfiniBan

DGX A100 Whitepaper

The SHIELD: Self-Healing Interconnect

Artificial Intelligence

InfiniBand Enables the Most Efficient Machine Learning Platforms

THE SELENE SUPERCOMPUTER

Maximizing Data Center Storage and Network IO Performance with NVIDIA Magnum IO

Setting A New Bar IN MLPERF

SHARPv2 Delivers Highest Performance for AI

SHARPv2 Performance Advantage

SHARPv2 Delivers Highest Performance for AI

10X Performance with NVIDIA Networking GPUDirect RDMA

Resources

Videos

Videos

Solution Briefs

Product Briefs

Whitepapers

SHARP_v2 Delivers Highest Performance for AI

SHARP_v2 Performance Advantage

SHARP_v2 Delivers Highest Performance for AI