• Tesla_T4_3qtr.png
  • Tesla_T4_3qtrTop.png
  • Tesla_T4_Front_Top.png
  • Tesla_T4_Front.png
  • Tesla_T4_Bracket.png
  • Tesla_T4_Back.png




Where to Buy

  • Description


    Next-Level Acceleration Has Arrived

    The artificial intelligence revolution surges forward, igniting opportunities for businesses to reimagine how they solve their customers’ challenges. We’re racing toward a future where every customer interaction, every product, every service offering will be touched and improved by AI. And making that future a reality requires a computing platform that can accelerate the full diversity of modern AI, enabling businesses to re-envision how they meet—and exceed—customer demands and cost-effectively scale their AI-based products and services.

    The NVIDIA T4 GPU is among the world’s most powerful universal inference accelerators. Powered by NVIDIA Turing Tensor Cores, T4 provides revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. T4 is a part of the NVIDIA AI inference platform that supports all AI frameworks and provides comprehensive tooling and integrations to drastically simplify the development and deployment of advanced AI.



    GPU Architecture NVIDIA Turing
    Turing Tensor Cores 320
    NVIDIA CUDA Cores 2560
    Peak FP32 8.1 TFLOPS
    Mixed Precision | FP16/FP32 65 TFLOPS
    INT8 130 TOPS
    INT4 260 TOPS
    GPU Memory 16 GB GDDR6
    Memory Bandwidth 300 GB/s
    Thermal Solution Passive
    Maximum Power Consumption 70 W
    System Interface PCIe Gen 3.0 x16
    Compute APIs CUDA | NVIDIA TensorRT | ONYX

    Turing Tensor Cores: The Heart of Universal Inference Acceleration

    AI is evolving rapidly. In the past few years alone, a Cambrian explosion of neural network types has seen the emergence of convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), reinforcement learning (RL), and hybrid network architectures. Accelerating these diverse models requires both high performance and programmability.

    NVIDIA T4 introduces the revolutionary Turing Tensor Core technology with multi-precision computing for AI inference. Powering breakthrough performance from FP32 to FP16 to INT8, as well as INT4 and binary precisions, T4 delivers dramatically higher performance than CPUs.

    Developers can unleash the power of Turing Tensor Cores directly through NVIDIA TensorRT, software libraries and integrations with all AI frameworks. These tools let developers target optimal precision for different AI applications, achieving dramatic performance gains without compromising accuracy of results.

    State-of-the-art Inference in Real-Time

    Responsiveness is key to user engagement for services such as conversational AI, recommender systems, and visual search. As models increase in accuracy and complexity, delivering the right answer right now requires exponentially larger compute capability.

    NVIDIA T4 features multi-process service (MPS) with hardware-accelerated work distribution. MPS reduces latency for processing requests, and enables multiple independent requests to be simultaneously processed, resulting in higher throughput and efficient utilization of GPUs.

    Twice the Video Decode Performance

    Video continues on its explosive growth trajectory, comprising over two-thirds of all Internet traffic. Accurate video interpretation through AI is driving the most relevant content recommendations, finding the impact of brand placements in sports events, and delivering perception capabilities to autonomous vehicles, among other usages.

    NVIDIA T4 delivers breakthrough performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into the video pipeline to deliver innovative, smart video services. It features performance and efficiency modes to enable either fast encoding or the lowest bit-rate encoding without losing video quality.

    Industry’s Most Comprehensive AI Inference Platform

    AI has crossed the chasm and is rapidly moving from early adoption by pioneers to broader use across industries and large-scale production deployments. Powered by the flexible NVIDIA CUDA development environment and a mature ecosystem with over 1M developers, NVIDIA AI Platform has been evolving for over a decade to offer comprehensive tooling and integrations to simplify the development and deployment of AI.

    NVIDIA TensorRT enables optimization of trained models to efficiently run inference on GPUs. NVIDIA ATTIS and Kubernetes on NVIDIA GPUs streamline the deployment and scaling of AI-powered applications on GPU-accelerated infrastructure for inference. Libraries like cuDNN, cuSPARSE, CUTLASS, and DeepStream accelerate key neural network functions and use cases, like video transcoding. And workflow integrations with all AI frameworks freely available from NVIDIA GPU Cloud containers enable developers to transparently harness the innovations in GPU computing for end-to- end AI workflows, from training neural networks to running inference in production applications.


    3-Year Limited Warranty

    Dedicated NVIDIA Quadro Field Application Engineers

  • Features


    Key Benefits

    Data Scientists

    GPU Inference enables you to bring state-of-the-art AI to your products and services by removing the computing bottleneck to innovation.

    Every AI framework is supported on the NVIDIA inference platform, which drastically simplifies optimization and deployment of your AI models from training to inference.

    IT Managers and Data Center Directors

    AI will increasingly be used in products and services, with AI inference constituting an increasingly large portion of data center workloads.

    NGC (NVIDIA GPU Cloud) simplifies deployment by providing a comprehensive catalog of performance-engineered containers for both training and inference.

    With multi-precision support, T4 GPUs allow standardization on a single architecture for all AI inference workloads.

    T4 GPUs provide the most efficient platform for both real-time inference as well as large batch inference.

    NVIDIA GPUs are designed for the scalability, uptime, and serviceability needs of data centers.

    GPU inference saves money by providing a dramatic boost in throughput and power efficiency.

    Lower TCO and Broad Industry and Vendor support

    GPU inference dramatically improves total cost of ownership (TCO) by delivering the same throughput with fewer, more powerful servers that require a fraction of power and floor space.

    GPU inference servers are widely available through PNY’s ecosystem of leading OEMs and ODMs with enterprise class support.

  • Specifications



    Compatible in all systems that accept an NVIDIA T4

    GPU Architecture NVIDIA Turing
    Use Case Universal Deep Learning Accelerator
    NVIDIA GPU TU104-895
    GPU Clocks 585 MHz Base | 1590 MHZ Maximum Boots
    Turing Tensor Cores 320
    NVIDIA CUDA Cores 2560
    Peak FP32 8.1 TFLOPS
    Mixed Precision | FP16/FP32 65 TFLOPS
    INT8 130 TOPS
    INT4 260 TOPS
    GPU Memory 16 GB GDDR6
    ECC Yes | Enabled by default
    Memory Interface 256-bit
    Maximum Memory Clock 5001 MHz
    Memory Bandwidth 300 GB/s
    CODECs Supported H.264 | H.265
    720p Encoding Streams 22 Simultaneously in HQ mode
    1080p Encoding Streams 10 Simultaneously
    Ultra HD | 2160p Streams 2-3 Simultaneously
    Thermal Solution Passive
    Operating Temperature 0 to 50 degrees Centigrade
    Operating Humidity 5% to 90% relative humidity
    Maximum Power Consumption 70 W
    System Interface PCIe Gen 3.0 x16 | 32 GB/s
    PCIe Gen 3.0 x8 also supported
    Form Factor Low-Profile PCIe | Single Slot
    Physical Dimensions 6.61” L x 2.71” H
    Compute APIs CUDA | NVIDIA TensorRT | Open CL | ONYX
    Graphics APIs DirectX 2 | OpenGL 4.6 | Vulkan 1.2

    View All Product Specifications


    • Windows Server 2012 R2
    • Windows Server 2016 1607, 1709
    • Windows Server 2019
    • RedHat CoreOS 4.7
    • Red Hat Enterprise Linux 8.1-8.3
    • Red Hat Enterprise Linux 7.7-7.9
    • Red Hat Linux 6.6+
    • SUSE Linux Enterprise Server 15 SP2
    • SUSE Linux Enterprise Server 12 SP 3+
    • Ubuntu 14.04 LTS/16.04/18.04 LTS/20.04 LTS


    • NVIDIA T4 with attached low-profile bracket

Related Products