Artificial Intelligence

NVIDIA Networking Solutions

Artificial Intelligence Needs the Most Intelligent Interconnect

InfiniBand Enables the Most Efficient Machine Learning Platforms

Machine learning is a pillar of today’s technological world, offering solutions that enable better and more accurate decision-making based on the great amounts of data being collected. Machine learning encompasses a wide range of applications, ranging from security, finance, and image and voice recognition, to self-driving cars, healthcare and smart cities.

InfiniBand accelerates all popular frameworks such as TensorFlow, CNTK, Paddle, Pytorch and Apache Spark with RDMA, and continues to innovate and accelerate solutions for fastest and most scalable distributed execution of training large and powerful models.

By providing low latency, high bandwidth, high message rate, and smart offloads, InfiniBand solutions are the most deployed high-speed interconnect for large-scale machine learning - for both training and inferencing systems.


NVIDIA’s DGX SuperPOD with NVIDIA Mellanox HDR 200Gb/s InfiniBand Deployment

One of the fastest and most efficient supercomputers on the planet built in under one month

Maximizing Data Center Storage and Network IO Performance with NVIDIA Magnum IO

Magnum IO utilizes storage IO, network IO, in-network compute, and IO management to simplify and speed up data movement, access, and management for multi-GPU, multi-node systems.

Magnum IO supports NVIDIA CUDA-X™ libraries and makes the best use of a range of NVIDIA GPU and NVIDIA networking hardware topologies to achieve optimal throughput and low latency.


Setting A New Bar IN MLPERF

NVIDIA training and inference solutions deliver record-setting performance in MLPerf, the leading industry benchmark for AI performance.

SHARPv2 Delivers Highest Performance for AI

An autoencoder is a neural network that learns to copy its input to its output. It has an internal (hidden) layer that describes a code used to represent the input, and it is constituted by two main parts: an encoder that maps the input into the code, and a decoder that maps the code to a reconstruction of the original input.

Variational autoencoders, use a variational approach for latent representation learning and allow for the design of complex generative models of data, and fit them to large datasets. Some interesting examples include the generation of images of fictional faces, perhaps for on-line customer service or professors in an online educational institute, or event high-resolution digital artwork.


Mellanox SHARPv2 technology improves the scalable performance of VAE significantly.


SHARPv2 Performance Advantage

Many parallel applications will require accessing the reduced results across all processes rather than the root process. In a similar complementary style of MPI_Allgather to MPI_Gather, MPI_Allreduce will reduce the values and distribute the results to all processes.

Latency as well as network bandwidth is important for many AI workloads including NLP. InfiniBand with Mellanox SHARP technology dramatically reduces latency for messages that are typically larger in size. Even RDMA over Converged Ethernet is no match for the in-network computing capabilities of SHARP.

SHARPv2 Delivers Highest Performance for AI

Google Neural Machine Translation (GNMT) is a neural machine translation (NMT) system developed by Google that uses an artificial neural network to increase fluency and accuracy in Google Translate.

GNMT improves on the quality of translation by applying an example-based (EBMT) machine translation method in which the system "learns from millions of examples“

The tightly coupled integration between the new NCCL® + SHARP plugin from HPC-X™ demonstrates exceptional performance for GNMT


10X Performance with NVIDIA Networking GPUDirect RDMA

Designed specifically for the needs of GPU acceleration, GPUDirect® RDMA provides direct communication between NVIDIA GPUs in remote systems. This eliminates the system CPUs and required buffer copies of data via the system memory, resulting in 10X better performance.

GPU Direct


NVIDIA DGX™ A100 is the universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. NVIDIA DGX A100 features the world’s most advanced accelerator, the NVIDIA A100 Tensor Core GPU and up to 10 HDR 200 Gb/s InfiniBand adapters, enabling enterprises to consolidate training and inference into a unified, easy-to-deploy AI infrastructure.

Learn More

Learn More

About NVIDIA Networking Solutions

Ready To Purchase?

Register an Opportunity