NVIDIA H100

NVIDIA^® H100

SKU: NVH100TCGPU-KIT

Where to Buy

Description

Certification Request

EOL Notification Form EOL Notification Form

NVIDIA H100 PCIe

Unprecedented Performance, Scalability, and Security for Every Data Center

The NVIDIA^® H100 Tensor Core GPU enables an order-of-magnitude leap for large-scale AI and HPC with unprecedented performance, scalability, and security for every data center and includes the NVIDIA AI Enterprise software suite to streamline AI development and deployment. H100 accelerates exascale scale workloads with a dedicated Transformer Engine for trillion parameter language models. For small jobs, H100 can be partitioned down to right-sized Multi-Instance GPU (MIG) partitions. With Hopper Confidential Computing, this scalable compute power can secure sensitive applications on shared data center infrastructure. The inclusion of NVIDIA AI Enterprise with H100 PCIe purchases reduces time to development and simplifies deployment of AI workloads, and makes H100 the most powerful end-to-end AI and HPC data center platform.

The NVIDIA Hopper architecture delivers unprecedented performance, scalability and security to every data center. Hopper builds upon prior generations from new compute core capabilities, such as the Transformer Engine, to faster networking to power the data center with an order of magnitude speedup over the prior generation. NVIDIA NVLink supports ultra-high bandwidth and extremely low latency between two H100 boards, and supports memory pooling and performance scaling (application support required). Second-generation MIG securely partitions the GPU into isolated right-size instances to maximize QoS (quality of service) for 7x more secured tenants. The inclusion of NVIDIA AI Enterprise (exclusive to the H100 PCIe), a software suite that optimizes the development and deployment of accelerated AI workflows, maximizes performance through these new H100 architectural innovations. These technology breakthroughs fuel the H100 Tensor Core GPU - the world's mostadvanced GPU ever built.

Performance Highlights
FP64	26 TFLOPS
FP64 Tensor Core	51 TFLOPS
FP32	51 TFLOPS
TF32 Tensor Core	51 TFLOPS \| Sparsity
BFLOAT16 Tensor Core	1513 TFLOPS \| Sparsity
FP16 Tensor Core	1513 TFLOPS \| Sparsity
FP8 Tensor Core	3026 TFLOPS \| Sparsity
INT8 Tensor Core	3026 TOPS \| Sparsity
GPU Memory	80GB HBM2e
GPU Memory Bandwidth	2.0 TB/sec
Maximum Power Consumption	350 W

World's Most Advanced Chip

Built with 80 billion transistors using a cutting edge TSMC 4N process custom tailored for NVIDIA's accelerated compute needs, H100 is the world's most advanced chip ever built. It features major advances to accelerate AI, HPC, memory bandwidth, interconnect and communication at data center scale.

NVIDIA Hopper Architecture

The NVIDIA H100 Tensor Core GPU powered by the NVIDIA Hopper GPU architecture delivers the next massive leap in accelerated computing performance for NVIDIA's data center platforms. H100 securely accelerates diverse workloads from small enterprise workloads, to exascale HPC, to trillion parameter AI models. Implemented using TSMC's 4N process customized for NVIDIA with 80 billion transistors, and including numerous architectural advances, H100 is the world's most advanced chip ever built.

Fourth-Generation Tensor Cores

New fourth-generation Tensor Cores are up to 6x faster chip-to-chip compared to A100, including per-SM speedup, additional SM count, and higher clocks of H100. On a per SM basis, the Tensor Cores deliver 2x the MMA (Matrix Multiply-Accumulate) computational rates of the A100 SM on equivalent data types, and 4x the rate of A100 using the new FP8 data type, compared to previous generation 16-bit floating point options. The Sparsity feature exploits fine-grained structured sparsity in deep learning networks, doubling the performance of standard Tensor Core operations.

Structural Sparsity

AI networks are big, having millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in H100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.

Transformer Engine Supercharges AI, Up to 30x Higher Performance

Transformer models are the backbone of language models used widely today from BERT to GPT-3. Initially developed for natural language processing (NLP) use cases, Transformer's versatility is increasingly applied to computer vision, drug discovery and more. Their size continues to increase exponentially, now reaching trillions of parameters and causing their training times to stretch into months due to large math bound computation, which is impractical for business needs. The Transformer Engine uses software and custom Hopper Tensor Core technology designed specifically to accelerate training for models built from the world's most important AI model building block, the Transformer. Hopper Tensor Cores have the capability to apply mixed 8-bit floating point (FP8) and FP16 precision formats to dramatically accelerate the AI calculations for transformers.

New DPX Instructions

Dynamic programming is an algorithmic technique for solving a complex recursive problem by breaking it down into simpler subproblems. By storing the results of subproblems so that you don't have to recompute them later, it reduces the time and complexity of exponential problem solving. Dynamic programming is commonly used in a broad range of use cases. For example, Floyd-Warshall is a route optimization algorithm that can be used to map the shortest routes for shipping and delivery fleets. The Smith-Waterman algorithm is used for DNA sequence alignment and protein folding applications. Hopper introduces DPX instructions to accelerate dynamic programming algorithms by 40x (DPU instructions 40x vs CPU comparison) compared to CPUs and 7x compared to NVIDIA Ampere architecture GPUs. This leads to dramatically faster times in disease diagnosis, real-time routing optimizations, and even graph analytics.

New Thread Block Cluster Feature

Allows programmatic control of locality at a granularity larger than a single Thread Block on a single SM. This extends the CUDA programming model by adding another level to the programming hierarchy to now include Threads, Thread Blocks, Thread Block Clusters, and Grids. Clusters enable multiple Thread Blocks running concurrently across multiple SMs. to synchronize and collaboratively fetch and exchange data.

Enhanced Asynchronous Execution Features

New Asynchronous Execution features include a new Tensor Memory Accelerator (TMA) unit that can transfer large blocks of data very efficiently between global memory and shared memory. TMA also supports asynchronous copies between Thread Blocks in a Cluster. There is also a new Asynchronous Transaction Barrier for doing atomic data movement and synchronization.

Second-Generation Multi-Instance GPU (MIG) Technology

With Multi-Instance GPU (MIG) previously introduced in Ampere, a GPU can be partitioned into several smaller, fully isolated instances with their own memory, cache, and compute cores. The Hopper architecture further enhances MIG by supporting multi-tenant, multi-user configurations in virtualized environments across up to seven secure GPU instances, securely isolating each instance with confidential computing at the hardware and hypervisor level. Dedicated video decoders for each MIG instance deliver secure, high-throughput intelligent video analytics (IVA) on shared infrastructure. With Hopper's concurrent MIG profiling administrators can monitor right-sized GPU acceleration and optimize resource allocation for users. For researchers with smaller workloads, rather than renting a full CSP instance, they can elect to use MIG to securely isolate a portion of a GPU while being assured that their data is secure at rest, in transit, and at compute.

New Confidential Computing Support

Today's confidential computing solutions are CPU-based, which is too limited for compute-intensive workloads like AI and HPC. NVIDIA Confidential Computing is a built-in security feature of the NVIDIA Hopper architecture that makes NVIDIA H100 the world's first accelerator with confidential computing capabilities. Users can protect the confidentiality and integrity of their data and applications in use while accessing the unsurpassed acceleration of H100 GPUs. It creates a hardware-based trusted execution environment (TEE) that secures and isolates the entire workload running on a single H100 GPU, multiple H100 GPUs within a node, or individual MIG instances. GPU-accelerated applications can run unchanged within the TEE and don't have to be partitioned. Users can combine the power of NVIDIA software for AI and HPC with the security of a hardware root of trust offered by NVIDIA Confidential Computing.

HBM2e Memory Subsystem

H100 is bringing massive amounts of compute to data centers. To fully utilize that compute performance, the NVIDIA H100 PCIe utilizes HBM2e memory with a class-leading 2 terabytes per second (TB/sec) of memory bandwidth, a 50 percent increase over the previous generation. In addition to 80 gigabytes (GB) of HBM2e memory, H100 includes 50 megabytes (MB) of L2 cache. The combination of this faster HBM memory and larger cache provides the capacity to accelerate the most computationally intensive AI models.

Fourth-Generation NVIDIA NVLink

Provides a 3x bandwidth increase on all-reduce operations and a 50% general bandwidth increase over the prior generation NVLink with 900 GB/sec total bandwidth for multi-GPU IO operating at nearly 5x the bandwidth of PCIe Gen 5.

PCIe Gen5 for State of the Art CPUs and DPUs

The H100 is NVIDIA's first GPU to support PCIe Gen5, providing the highest speeds possible at 128GB/s (bi-directional). This fast communication enables optimal connectivity with the highest performing CPUs, as well as with NVIDIA ConnectX-7 SmartNICs and BlueField-3 DPUs, which allow up to 400Gb/s Ethernet or NDR 400Gb/s InfiniBand networking acceleration for secure HPC and AI workloads.

Enterprise Ready: AI Software Streamlines Development and Deployment

Enterprise adoption of AI is now mainstream and organizations require end-to-end, AI ready infrastructure that will future proof them for this new era. NVIDIA H100 Tensor Core GPUs for mainstream servers (PCIe) come with NVIDIA AI Enterprise software, making AI accessible to nearly every organization with the highest performance in training, inference, and data-science. NVIDIA AI Enterprise together with NVIDIA H100 simplifies the building of an AI-ready platform, accelerates AI development and deployment with enterprise-grade support, and delivers the performance, security, and scalability to gather insights faster and achieve business value sooner.

Warranty

3-Year Limited Warranty

Free dedicated phone and email technical support
(1-800-230-0130)

Dedicated NVIDIA professional products Field Application Engineers

Resources

Links

Contact gopny@pny.com for additional information.

NVIDIA H100

NVIDIA® H100

NVIDIA H100 PCIe

Unprecedented Performance, Scalability, and Security for Every Data Center

Performance Highlights

FP64

FP64 Tensor Core

FP32

TF32 Tensor Core

BFLOAT16 Tensor Core

FP16 Tensor Core

FP8 Tensor Core

INT8 Tensor Core

GPU Memory

GPU Memory Bandwidth

Maximum Power Consumption