NVIDIA Ampere Architecture
NVIDIA A10 is a powerful, dense, and cost-effective GPU for visual computing, offering high performance real-time ray tracing, AI-accelerated compute, and professional graphics rendering. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances ray tracing operations, tensor matrix operations, and concurrent executions of FP32 and INT32 operations.
CUDA Cores
The NVIDIA Ampere architecture’s CUDA cores bring up to 3x the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for graphics workflows such as 3D model development and compute for workloads such as desktop simulation for computer-aided engineering (CAE).
Second Generation RT Cores
Incorporating second generation ray tracing engines, the NVIDIA Ampere GPU architecture provides incredible ray traced rendering performance. NVIDIA A10 can render complex professional models with physically accurate shadows, reflections, and refractions to empower users with instant insight. Working in concert with applications leveraging APIs such as NVIDIA OptiX, Microsoft DXR and Vulkan ray tracing, servers based on NVIDIA A10 will power truly interactive design workflows to provide immediate feedback for unprecedented levels of productivity. NVIDIA A10 is up to 2x faster in ray tracing compared to the previous generation. This technology also speeds up the rendering of ray-traced motion blur by up to 7x for faster results with greater visual accuracy through hardware accelerating Motion BVH (bounding volume hierarchy).
Third Generation Tensor Cores
Purpose-built for the acceleration of AI-enhanced applications, the NVIDIA A10 includes enhanced Tensor Cores that accelerate more datatypes (TF32 and BF16) and includes a new Fine-Grained Structured Sparsity feature.
PCIe Gen 4
The NVIDIA A10 supports PCI Express Gen 4, which provides double the bandwidth of PCIe Gen 3, improving data-transfer speeds from CPU memory for data-intensive tasks like AI and data science.
Higher Speed GDDR6 Memory
Built with 24 GB GDDR6 memory delivering higher bandwidth throughput for ray tracing, rendering, and AI workloads than the previous generation. The NVIDIA A10 provides a graphics memory footprint sufficient to address complex datasets and models in professional applications.
Fifth Generation NVDEC Engine
NVDEC is well suited for transcoding and video playback applications for real-time decoding. The following video codecs are supported for hardware-accelerated decoding: MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1.
Seventh Generation NVENC Engine
NVENC can take on the most demanding 4K or 8K video encoding tasks to free up the graphics engine and the CPU for other operations. The NVIDIA A10 provides better encoding quality than software-based x264 encoders.
Preemption
Preemption at the instruction-level provides finer grain control over compute and graphics tasks to prevent longer-running applications from either monopolizing system resources or timing out.
Virtual GPU Software for Virtualization
Support for NVIDIA virtual GPU (vGPU) software enables A10 to be virtualized to accelerate everything from general office productivity applications to high-end design, AI, and compute tasks. The NVIDIA Virtual PC (vPC) or NVIDIA RTX Virtual Workstation (vWS) products (license required) provides access to the world’s most powerful GPU accelerated virtual solutions to enable flexible, work-from-anywhere experiences, while a NVIDIA Virtual Compute Server (vCS) license accelerates virtualized compute workloads such as high-performance computing, AI and data science.
Software Optimized for AI
Deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others deliver dramatically faster training times and higher multi-node training performance. GPU accelerated libraries such as cuDNN, cuBLAS, and TensorRT deliver higher performance for both deep learning inference and High-Performance Computing (HPC) applications.
NVIDIA CUDA Parallel Computing Platform
Natively execute standard programming languages like C/C++ and Fortran, and APIs such as OpenCL, OpenACC and Direct Compute to accelerate techniques such as ray tracing, video and image processing, and computation fluid dynamics.
Unified Memory
A single, seamless 49-bit virtual address space allows for the transparent migration of data between the full allocation of CPU and GPU memory.