• 1-NVIDIA-A2-fr.png
  • 2-NVIDIA-A2-3QTR.png
  • 3-NVIDIA-A2-top.png
  • 4-NVIDIA-A2-3QTR-2.png
  • 5-NVIDIA-A2-3QTR-3.png
  • 6-NVIDIA-A2-3QTR-4.png



  • Description


    Unprecedented Acceleration for World’s Highest-Performing Elastic Data Centers

    The NVIDIA A2 Tensor Core GPU provides entry-level inference with low power, a small footprint, and high performance for intelligent video analytics (IVA) or NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low 40–60 watt (W) configurable thermal design power (TDP) capability, the A2 brings versatile inference acceleration to any server.

    A2’s versatility, compact size, and low power exceed the demands for edge deployments at scale, instantly upgrading existing entry-level CPU servers to handle inference. Servers accelerated with A2 GPUs deliver up to 20X higher inference performance versus CPUs and 1.3x more efficient IVA deployments than previous GPU generations — all at an entry-level price point.

    NVIDIA-Certified systems with the NVIDIA A2, A30, and A100 Tensor Core GPUs and NVIDIA AI—including the NVIDIA Triton Inference Server, open source inference service software—deliver breakthrough inference performance across edge, data center, and cloud. They ensure that AI-enabled applications deploy with fewer servers and less power, resulting in easier deployments and faster insights with dramatically lower costs.



    GPU Architecture NVIDIA Ampere
    CUDA Cores 1280
    Tensor Cores 40 | Gen 3
    RT Cores 108 Gen 2
    Peak FP32 4.5 TFLOPS
    Peak TF32 Tensor Core 9 TFLOPS | 18 TFLOPS Sparsity
    Peak FP16 Tensor Core 18 TFLOPS | 36 TFLOPS Sparsity
    INT8 36 TOPS | 72 TOPS Sparsity
    INT4 72 TOPS | 144 TOPS Sparsity
    GPU Memory 16 GB GDDR6 ECC
    Memory Bandwidth 200 GB/s
    Thermal Solution Passive
    Maximum Power Consumption 40-60 Watt | Configurable
    System Interface PCIe Gen 4.0 x8

    Third-Generation NVIDIA Tensor Cores

    • The third-generation Tensor Cores in NVIDIA A2 support integer math down to INT4 and floating-point math up to FP32 to deliver high AI training and inference performance. A2’s NVIDIA Ampere architecture also supports TF32 and NVIDIA’s automatic mixed precision (AMP) capabilities.

    Second-Generation RT Cores

    • The NVIDIA A2 GPU includes dedicated RT Cores for ray tracing and Tensor Cores for AI to power groundbreaking results at breakthrough speed. It delivers up to 2x the throughput over the previous generation and the ability to concurrently run ray tracing with either shading or denoising capabilities.

    Structural Sparsity

    • Modern AI networks are big and getting bigger, with millions to billions of parameters. Not all of these parameters are needed for accurate predictions and inference. A2 provides up to 2x higher compute performance for sparse models compared to previous-generation GPUs. This feature readily benefits AI inference and can be used to improve the performance of model training.

    Hardened Root of Trust for Secure Deployments

    • Providing security in edge deployments and end points is critical for enterprise business operations. The NVIDIA A2 GPU delivers secure boot through trusted code authentication and hardened rollback protections against malicious malware attacks, preventing operational losses and ensuring workload acceleration.

    Superior Hardware Transcoding Performance

    • Real-time performance is critical in IVA (Internet Video Analytics) at the edge, requiring the latest in hardware encode and decode capabilities. NVIDIA A2 GPUs use dedicated hardware to fully accelerate video decoding and encoding for the most popular codecs, including H.265, H.264, and VP9, as well as AV1 decode for real-time video processing.


    3-Year Limited Warranty

    Dedicated NVIDIA professional products Field Application Engineers

  • Features



    Up to 20X More Inference Performance

    AI inference is deployed to enhance consumer lives with smart, real-time experiences and to gain insights from trillions of end-point sensors and cameras. Compared to CPU-only servers, edge and entry-level servers with NVIDIA A2 Tensor Core GPUs offer up to 20X more inference performance, instantly upgrading any server to handle modern AI.

    Computer Vision

    Natural Language Processing

    (Tacotron2 + Waveglow)

    Comparisons of one NVIDIA A2 Tensor Core GPU versus a dual-socket Xeon Gold 6330N CPU

    System Configuration: [CPU: HPE DL380 Gen10 Plus, 2S Xeon Gold 6330N @2.2GHz, 512GB DDR4] NLP: BERT-Large (Sequence length: 384, SQuAD: v1.1) | TensorRT 8.2, Precision: INT8, BS:1 (GPU) | OpenVINO 2021.4, Precision: INT8, BS:1 (CPU) Text-to-Speech: Tacotron2 + Waveglow end-to-end pipeline (input length: 128) | PyTorch 1.9, Precision: FP16, BS:1 (GPU) | PyTorch 1.9, Precision: FP32, BS:1 (CPU) Computer Vision: EfficientDet-D0 (COCO, 512x512) | TensorRT 8.2, Precision: INT8, BS:8 (GPU) | OpenVINO 2021.4, Precision: INT8, BS:8 (CPU)

    Higher IVA Performance for the Intelligent Edge

    Servers equipped with NVIDIA A2 GPUs offer up to 1.3X more performance in intelligent edge use cases, including smart cities, manufacturing, and retail. NVIDIA A2 GPUs running IVA workloads deliver more efficient deployments with up to 1.6X better price-performance and 10 percent better energy efficiency than previous GPU generations.

    IVA Performance (Normalized)

    System Configuration: [Supermicro SYS-1029GQ-TRT, 2S Xeon Gold 6240 @2.6GHz, 512GB DDR4, 1x NVIDIA A2 OR 1x NVIDIA T4] | Measured performance with Deepstream 5.1. Networks: ShuffleNet-v2 (224x224), MobileNet-v2 (224x224). | Pipeline represents end-to-end performance with video capture and decode, pre-processing, batching, inference, and post-processing.

    Optimized for Any Server

    NVIDIA A2 is optimized for inference workloads and deployments in entry-level servers constrained by space and thermal requirements, such as 5G edge and industrial environments. A2 delivers a low-profile form factor operating in a low-power envelope, from a TDP of 60W down to 40W, making it ideal for any server.

    Lower Power and Configurable TDP

  • Specifications



    Compatible in all systems that accept an NVIDIA A2

    Architecture Ampere
    Use Case Universal Deep Learning Acccelerator
    GPU Clocks 1440 MHz Base | 1770 MHz Boost
    NVIDIA CUDA Cores 1280
    Tensor Cores 40 | Gen 3
    RT Cores 10 | Gen 2
    Peak FP32 4.5 TFLOPS
    Peak TF32 Tensor Core 9 TFLOPS | 18 TFLOPS Sparsity
    Peak FP16 Tensor Core 18 TFLOPS | 36 TFLOPS Sparsity
    INT8 36 TOPS | 72 TOPS Sparsity
    INT4 72 TOPS | 144 TOPS Sparsity
    GPU Memory 16 GB GDDR6 ECC
    ECC On by Default
    Memory Inferface 128-bit
    Memory Clock 6251 MHz
    Memory Bandwidth 200 GB/s
    Video Encode | Decode 1x Video Encoder | 2x Video Decoder
    CODECs Supported H.265 | H.265| VP9 | AV1 Decode
    Thermal Solution Passive
    Operating Temperature 0 to 50 degrees Centigrade
    Operating Humidity 5% to 85% Relative Humidity
    Maximum Power Consumption 40-60 W | Configurable
    System Interface PCIe Gen 4.0 x8
    Form Factor Low-Profile PCIe | Single Slot
    Physical Dimensions 6.61” L x 2.71” H
    Hardware Root of Trust Yes
    NEBS Ready Level 3
    NVIDIA CUDA Support 11.1 or later
    vGPU Software Revision 14.0 or later
    Virtual GPU (vGPU) Software Support NVIDIA vApps, vPC, vWS, vCS, NVIDIA AI Enterprise


    • Windows Server 2012 R2
    • Windows Server 2016 1607, 1709
    • Windows Server 2019
    • RedHat CoreOS 4.7
    • Red Hat Enterprise Linux 8.1-8.3
    • Red Hat Enterprise Linux 7.7-7.9
    • Red Hat Linux 6.6+
    • SUSE Linux Enterprise Server 15 SP2
    • SUSE Linux Enterprise Server 12 SP 3+
    • Ubuntu 14.04 LTS/16.04/18.04 LTS/20.04 LTS


    • NVIDIA Virtual Apps | vApps
    • NVIDIA Virtual PC | vPC
    • NVIDIA RTX Virtual Workstation | vWS
    • NVIDIA Virtual Compute Server | vCS
    • NVIDIA AI Enterprise


    • NVIDIA A2 PCIe with attached low-profile bracket
    • Unattached full-height (ATX) bracket