Turing Tensor Cores: The Heart of Universal Inference Acceleration
AI is evolving rapidly. In the past few years alone, a Cambrian explosion of neural network types has seen the emergence of convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), reinforcement learning (RL), and hybrid network architectures. Accelerating these diverse models requires both high performance and programmability.
NVIDIA T4 introduces the revolutionary Turing Tensor Core technology with multi-precision computing for AI inference. Powering breakthrough performance from FP32 to FP16 to INT8, as well as INT4 and binary precisions, T4 delivers dramatically higher performance than CPUs.
Developers can unleash the power of Turing Tensor Cores directly through NVIDIA TensorRT, software libraries and integrations with all AI frameworks. These tools let developers target optimal precision for different AI applications, achieving dramatic performance gains without compromising accuracy of results.
State-of-the-art Inference in Real-Time
Responsiveness is key to user engagement for services such as conversational AI, recommender systems, and visual search. As models increase in accuracy and complexity, delivering the right answer right now requires exponentially larger compute capability.
NVIDIA T4 features multi-process service (MPS) with hardware-accelerated work distribution. MPS reduces latency for processing requests, and enables multiple independent requests to be simultaneously processed, resulting in higher throughput and efficient utilization of GPUs.
 
Twice the Video Decode Performance
Video continues on its explosive growth trajectory, comprising over two-thirds of all Internet traffic. Accurate video interpretation through AI is driving the most relevant content recommendations, finding the impact of brand placements in sports events, and delivering perception capabilities to autonomous vehicles, among other usages.
NVIDIA T4 delivers breakthrough performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into the video pipeline to deliver innovative, smart video services. It features performance and efficiency modes to enable either fast encoding or the lowest bit-rate encoding without losing video quality.
Industry’s Most Comprehensive AI Inference Platform
AI has crossed the chasm and is rapidly moving from early adoption by pioneers to broader use across industries and large-scale production deployments. Powered by the flexible NVIDIA CUDA development environment and a mature ecosystem with over 1M developers, NVIDIA AI Platform has been evolving for over a decade to offer comprehensive tooling and integrations to simplify the development and deployment of AI.
NVIDIA TensorRT enables optimization of trained models to efficiently run inference on GPUs. NVIDIA ATTIS and Kubernetes on NVIDIA GPUs streamline the deployment and scaling of AI-powered applications on GPU-accelerated infrastructure for inference. Libraries like cuDNN, cuSPARSE, CUTLASS, and DeepStream accelerate key neural network functions and use cases, like video transcoding. And workflow integrations with all AI frameworks freely available from NVIDIA GPU Cloud containers enable developers to transparently harness the innovations in GPU computing for end-to- end AI workflows, from training neural networks to running inference in production applications.