PNY Technologies Inc.
NVIDIA<sup>®</sup> A40



  • Description

    NVIDIA A40

    The World’s Most Powerful Data Center GPU for Visual Computing

    Modern data centers are evolving rapidly. Advanced technologies such as real-time ray tracing, AI, compute, simulation, and VR are common across industries. The need to work remotely has accelerated faster than anyone could have anticipated, with workloads that span the entire enterprise.

    NVIDIA A40 delivers the data center-based solution designers, engineers, artists, and scientists need to meet today’s challenges. Built on the NVIDIA Ampere architecture, the A40 combines the latest generation RT Cores, Tensor Cores, and CUDA Cores with 48GB of graphics memory for unprecedented graphics, rendering, compute, and AI performance. From powerful virtual workstations accessible from anywhere to dedicated render nodes, the A40 is built to tackle the most demanding visual computing workloads from the data center.

    NVIDIA A40


    CUDA Cores 10752
    RT Cores | Gen 2 84
    Tensor Cores | Gen 3 336
    GPU Memory 48 GB GDDR6 ECC
    Memory Interface 384-bit
    Memory Bandwidth 696 GB/s
    NVLink 2-Way, 2-Slot, 112.5 GB/s Bidirectional
    System Interface PCIE 4.0 x16
    Display Connectors 3x DisplayPort 1.4, Off by Default
    Thermal Solutions Passive
    vGPU Support NVIDIA Virtual PC, NVIDIA RTX Virtual Workstation (vWS), NVIDIA Virtual Compute Server (vCS, no MIG)
    System Interface PCI Express 4.0 x16
    Maximum Power Consumption 300 W


    • Second-generation RT Cores provide up to 2x the throughput of the previous generation and enable concurrent ray tracing and shading, improving ray tracing performance.
    • With 48 GB of GPU memory, expanding to 96 GB with NVLink, the A40 provides the memory capacity required for the largest GPU-accelerated renders.

    Virtual Workstations

    • Combined with NVIDIA vGPU software, the A40, with 48 GB of GPU memory, can accelerate the world’s most powerful virtual workstations which can be accessed remotely from the data center.
    • The NVIDIA Ampere architecture’s CUDA cores and third-generation Tensor Cores provide increased performance compared to the previous generation for compute-intensive workloads like data science, deep learning, and machine learning with NVIDIA Virtual Compute Server (vCS) software.

    Scalable Visualization

    • Power immersive visual experiences with the NVIDIA A40 when its DisplayPort outputs are enabled, taking advantage of NVIDIA professional display technologies such as Quadro Sync II and NVIDIA Mosaic in display mode for perfect multi-display video synchronization to create high-resolution display environments such as CAVEs (Cave Automatic Virtual Environments, massive display walls, or location-based entertainment.

    Collaboration (Omniverse)

    • AEC design teams can draw on NVIDIA A40 with NVIDIA RTX Virtual Workstation (vWS) software to collaborate in real-time on massive 3D models in the NVIDIA Omniverse AEC Experience with remotely located colleagues and clients. Architects and designers can quickly create and iterate on building designs and view accurate, predictable visualizations with real-time ray tracing or in immersive virtual reality.

    AR/VR at the Edge

    • With NVIDIA A40 GPUs, researchers, developers, and scientists can provision servers to provide multiple high-performance workstations for augmented reality (AR) and virtual reality (VR) development at the edge.
    • The NVIDIA software stack includes NVIDIA RTX Virtual Workstation (vWS) software for provisioning multiple high-performance virtual workstations, NVIDIA’s extensive developer tools for developing AR and VR, and the NVIDIA CloudXR™ SDK for driving wireless AR/VR experiences.


    • Computer-aided engineering (CAE) analysts and engineers can set up, test, and iterate on simulations faster with NVIDIA RTX Virtual Workstation (vWS) software, which delivers virtual workstations with the massive compute power they need to design by day and compute by night—from anywhere they choose to work.


    • A40 delivers industry leading performance for live broadcast production by combining advanced technologies such as AI, real-time ray tracing, and virtualization.
    • AI-enhanced workflows help broadcasters gain new creative capabilities and deep customer insights while reaching global markets on any device.
    • GPU-powered real-time ray tracing delivers photorealistic virtual sets and cinematic-quality animations.

    Incredible Performance

    • Fast, interactive performance powered by the NVIDIA Ampere-based GPU Architecture with ultra-fast on-board graphics memory technology and optimized software drivers for professional applications.
    • The NVIDIA A40 includes 84 second generation RT Cores to accelerate photorealistic ray-traced rendering up to 2x faster than the previous generation. Hardware accelerated Motion BVH (Bounding Volume Hierarchy) improves motion blur rendering performance by up to 7x when compared to previous generation.
    • With 336 third generation Tensor Cores to accelerate AI workloads, the A40 provides the power necessary for AI development and training workloads. Incredible training and inferencing performance, combined with enterprise-class stability and reliability, make A40-powered servers ideal for professional AI deployments. Tensor Cores also bring AI capabilities to graphics with features like DLSS, AI denoising, and enhanced editing for select applications.
    • Scalable application performance with NVIDIA NVLink technology lets you combine two NVIDIA A40 cards to double the effective GPU memory to 96 GB.
    • Support for NVIDIA Virtual GPU (vGPU) software enables A40 to be virtualized to accelerate high-end design, AI, and compute workloads. The NVIDIA Quadro Virtual Data Center Workstation (Quadro vDWS) edition provides access to the world’s most powerful virtual workstations to enable flexible, work-from-anywhere solutions, while the NVIDIA Virtual Compute Server (vCS) edition accelerates virtualized compute workloads such as high-performance computing, AI, and data science.

    Data Center Class Reliability

    • Designed for 24 x 7 data center operations and driven by power-efficient hardware and components selected for optimum performance, durability, and longevity.
    • Fully tested and validated by NVIDIA, leading OEMs and system integrators to meet the most demanding real-world conditions.
    • Enterprise compatibility and stability with professional applications through NVIDIA support of the latest OpenGL, DirectX, Vulkan, and CUDA standards, deep independent software vendor (ISV) developer engagements, and virtual workstations powered by NVIDIA RTX Virtual Workstation (vWS) software leverage the same pro NVIDIA RTX platform as physical workstations, benefiting from extensive testing across a broad range of industry applications and certifications from over 100 independent software vendors (ISVs) to ensure optimal performance and stability.
    • Secure and measured boot with hardware root of trust technology within the GPU provide an additional layer of security for data centers. A40 meets the latest data center standards and is NEBS Level 3 compliant.
    • The NVIDIA A40 includes a CEC 1712 security chip that enables secure and measured boot with hardware root of trust, ensuring that firmware has not been tampered with or corrupted.

    vGPU Software Support

    • NVIDIA Virtual PC (vPC)
    • NVIDIA Virtual Applications (vApps)
    • NVIDIA RTX Virtual Workstation (vWS)
    • NVIDIA Virtual Compute Server (vCS)
    • vGPU Profiles from 1 GB to 48 GB

    PNY Logo


    3-Year Limited Warranty

    Free dedicated phone and email technical support

    Dedicated NVIDIA professional products Field Application Engineers

    Contact for additional information.

  • Features

    NVIDIA A40


    NVIDIA Ampere-based Architecture

    NVIDIA A40 is the world's most powerful data center GPU for visual computing, offering high performance real-time ray tracing, AI-accelerated compute, and professional graphics rendering. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances ray tracing operations, tensor matrix operations, and concurrent executions of FP32 and INT32 operations.

    CUDA Cores

    The NVIDIA Ampere architecture’s CUDA Cores bring up to 2.5x the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for graphics workflows such as 3D model development and compute for workloads such as desktop simulation for computer-aided engineering (CAE).

    Second Generation RT Cores

    Incorporating second generation ray tracing engines, the NVIDIA Ampere GPU architecture provides incredible ray traced rendering performance. A single NVIDIA A40 board can render complex professional models with physically accurate shadows, reflections, and refractions to empower users with instant insight. Working in concert with applications leveraging APIs such as NVIDIA OptiX, Microsoft DXR and Vulkan ray tracing, servers based on NVIDIA A40 will power truly interactive design workflows to provide immediate feedback for unprecedented levels of productivity. NVIDIA A40 is up to 2x faster in ray tracing compared to the previous generation. This technology also speeds up the rendering of ray-traced motion blur by up to 7x for faster results with greater visual accuracy through hardware accelerating Motion BVH (Bounding Volume Hierarchy).

    Third Generation Tensor Cores

    Purpose-built for deep learning matrix arithmetic at the heart of neural network training and inferencing functions, the NVIDIA A40 includes enhanced Tensor Cores that accelerate more datatypes (TF32 and BF16) and includes a new Fine-Grained Structured Sparsity feature that delivers up to 2x throughput for tensor matrix operations compared to the previous generation.

    PCIe Gen 4

    The NVIDIA A40 supports PCI Express Gen 4, which provides double the bandwidth of PCIe Gen 3, improving data-transfer speeds from CPU memory for data-intensive tasks like AI and data science.

    Higher Speed GDDR6 Memory

    Built with 48GB GDDR6 memory delivering up to 10% greater throughput for ray tracing, rendering, and AI workloads than the previous generation. The NVIDIA A40 provides the industry’s largest graphics memory footprint to address the largest datasets and models in latency-sensitive professional applications.

    Error Correcting Code (ECC) on Graphics Memory

    Meet strict data integrity requirements for mission critical applications with uncompromised computing accuracy and reliability for workstations.

    Fifth Generation NVDEC Engine

    NVDEC is well suited for transcoding and video playback applications for real-time decoding. The following video codecs are supported for hardware-accelerated decoding: MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1.

    Seventh Generation NVENC Engine

    NVENC can take on the most demanding 4K or 8K video encoding tasks to free up the graphics engine and the CPU for other operations. NVIDIA A40 provides better encoding quality than software-based x264 encoders.


    Preemption at the instruction-level provides finer grain control over compute and graphics tasks to prevent longer-running applications from either monopolizing system resources or timing out.


    Third Generation NVLink

    Connect two NVIDIA A40 cards with NVLink to double the effective memory footprint and scale application performance by enabling GPU-to-GPU data transfers at rates up to 112.5 GB/s (total bandwidth).


    DisplayPort 1.4a

    Supports up to three 5K monitors at 60Hz, or dual 8K displays at 60Hz per card. The NVIDIA A40 supports HDR color for 4K at 60Hz for 10/12b HEVC decode and up to 4K at 60Hz for 10b HEVC encode. Each DisplayPort connector can drive ultra-high resolutions of 4096 x 2160 at 120 Hz with 30-bit color. A40 is configured for virtualization by default with physical display connectors disabled. The display outputs can be enabled via management software tools.

    NVIDIA Quadro Mosaic Technology

    Transparently scale the desktop and applications across up to 12 displays from 4 GPUs while delivering full performance and image quality.

    NVIDIA Quadro Sync II

    Synchronize the display and image output of up to 24 displays from 8 GPUs (connected through two Sync II boards) in a single system, reducing the number of machines needed to create an advanced video visualization environment.

    Frame Lock Connector Latch

    Each frame lock connector is designed with a self-locking retention mechanism to secure its connection with the frame lock cable to provide robust connectivity and maximum productivity.


    Virtual GPU Software for Virtualization

    Support for NVIDIA virtual GPU (vGPU) software enables A40 to be virtualized to accelerate high-end design, AI, and compute workloads. The NVIDIA RTX Virtual Workstation (vWS) license provides access to the world’s most powerful virtual workstations to enable flexible, work-from-anywhere solutions, while the NVIDIA Virtual Compute Server (vCS) license accelerates virtualized compute workloads such as high-performance computing, AI and data science.

    Software Optimized for AI

    Deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others deliver dramatically faster training times and higher multi-node training performance. GPU accelerated libraries such as cuDNN, cuBLAS, and TensorRT delivers higher performance for both deep learning inference and High-Performance Computing (HPC) applications.

    NVIDIA CUDA Parallel Computing Platform

    Natively execute standard programming languages like C/C++ and Fortran, and APIs such as OpenCL, OpenACC and Direct Compute to accelerates techniques such as ray tracing, video and image processing, and computation fluid dynamics.

    Unified Memory

    A single, seamless 49-bit virtual address space allows for the transparent migration of data between the full allocation of CPU and GPU memory.

    NVIDIA GPUDirect

    Supports a family of technologies to speed communication between the GPU and devices like NICs or Video I/O boards by reducing CPU overhead and minimizing copies.

    NVIDIA RTX Desktop Manager

    Gain unprecedented end-user control of the desktop experience for increased productivity in single large display or multi-display environments.

    NVIDIA RTX Experience

    RTX Experience delivers a suite of productivity tools to your desktop workstation, including 4K recording, automatic alerts for the latest Quadro driver updates, and access gaming features. The application is available to download at

  • Specifications

    NVIDIA A40


    Compatible in all systems that accept an NVIDIA A40

    Architecture Ampere
    Process Size 8nm
    Transistors 28.3 Billion
    Die Size 628.4 mm
    CUDA Cores 10752
    RT Cores | Gen 2 84
    Tensor Cores | Gen 3 336
    NVLink 2-Way Low Profile, 2-Slot
    NVLink Interconnect 112 GB/s Bidirectional
    GPU Memory 48 GB GDDR6 ECC
    Memory Interface 384-bit
    Memory Bandwidth 696 GB/sec
    Display Connectors 3x DisplayPort 1.4, Off by Default
    Thermal Solution Passive
    vGPU Support NVIDIA Virtual PC, NVIDIA RTX Virtual Workstation (vWS), NVIDIA Virtual Compute Server (vCS, no MIG)
    vGPU Profiles Supported 1 GB, 2 GB, 3 GB, 4 GB, 6 GB, 8 GB, 12 GB, 16 GB, 24 GB, 48 GB
    System Interface PCIE Express 4.0 x16
    Secure and Measured Boot Hardware Root of Trust CEC 1712
    NEBS Ready Level 3
    Power Connector 8-pin CPU
    Maximum Power Consumption 300 W

    View All Product Specifications


    • RTXA6000NVLINK-KIT pprovides an NVLink connector for A40 suitable for standard PCIe slot spacing motherboards, effectively fusing two physical boards into one logical entity with 21504 CUDA Cores, 672 Tensor Cores, 168 RT Cores, and 96 GB of GDDR6 ECC memory, with a bandwidth of 112 GB/s. Application support is required


    • Windows Server 2012 R2
    • Windows Server 2016 1607, 1709
    • Windows Server 2019
    • RedHat CoreOS 4.7
    • Red Hat Enterprise Linux 8.1-8.3
    • Red Hat Enterprise Linux 7.7-7.9
    • Red Hat Linux 6.6+
    • SUSE Linux Enterprise Server 15 SP2
    • SUSE Linux Enterprise Server 12 SP 3+
    • Ubuntu 14.04 LTS/16.04/18.04 LTS/20.04 LTS


    • NVIDIA A40 Data Center
    • Auxiliary power cable