NVIDIA Boosts AI Inferencing With Pascal Based Tesla P40 and P4

NVIDIA has announced their latest Pascal based Tesla P40 and Tesla P4 GPU accelerators. The new cards are designed to accelerator AI / Neural Network inferencing with a boost up to 45x over the CPUs and around 4x increase over by generation GPUs. The GPU accelerators are backed up with powerful software tools that deliver a massive increase in overall efficiency.

NVIDIA Tesla P40 and Tesla P4 Announced - Accelerating AI / Deep Neural Network Inferences

NVIDIA has created a platform for deep learning with their latest Tesla cards. The platform is segmented into Training and Infrerencing GPUs. For AI Training, NVIDIA offers the Tesla P100 solution with the fastest compute performance available to date, both FP16 and FP64. This along with DIGITS Training system and Deep learning frameworks adds in college efficiency and performance. On the other hand, nosotros have interfacing cards and this line is powered by the Tesla P40 and Tesla P4 accelerators.

The Tesla P4 and P40 are specifically designed for inferencing, which uses trained deep neural networks to recognize speech communication, images or text in response to queries from users and devices. Based on the Pascal architecture, these GPUs feature specialized inference instructions based on 8-bit (INT8) operations, delivering 45x faster response than CPUs¹ and a 4x improvement over GPU solutions launched less than a twelvemonth agone. via NVIDIA

Replacing the Tesla M40 and Tesla M4, the Pascal based accelerators come up with DeepStream SDK and TensorRT support. The two interfacing cards are based on the GP102 and GP104 compages, both of which are bachelor on NVIDIA's consumer platforms in the course of GeForce and Quadro. Let's take a await at the specifications for these cards:

NVIDIA Tesla P40 "Pascal GP102" Specifications:

The Tesla P40 is the faster function of the ii, featuring a full fledged GP102 GPU core. The card consists of 3840 CUDA cores and 24 GB of GDDR5 retentiveness. Clock speeds are maintained at 1303 MHz base of operations and 1531 MHz for boost. The retentiveness is clocked at seven.two GHz effective which delivers 346 GB/s bandwidth along a 384-bit interface. The chip packs 12 TFLOPs of FP32 and 47 TFLOPs of INT8 compute functioning on a 250W TDP package. Like the Tesla M40 before it, the P40 also comes in passive form factor.

NVIDIA Tesla P4 "Pascal GP104" Specifications:

The Tesla P4 on the other hand features the GP104 core. Information technology has the full 2560 CUDA cores attached to it but run at a much lower clock speed of 810 MHz base and 1063 MHz boost. This has to practice with the low course factor design which the carte is offered in, as it is designed for blade servers. The P4 also comes in a fifty-75W package which is much lower than the GTX 1080's 190W TDP. The GTX 1080 does feature the same core count but has higher clock speeds reaching upward to ii GHz. This product is clocked at one-half the rate of the 1080 hence the higher power efficiency.

Rest of the specifications include a eight GB video ram. Clock speeds for memory is retained at 6 GHz that offers 192 GB/s bandwidth along a 256-flake coach. The compute performance for this card is rated at 5.5 TFLOPs (FP32) and 22 DLTOPs (INT8). No price has been announced for the Tesla P40 or Tesla P4 but they are expected to hit the market place through OEM channels in late Q4 (October-Novemeber) 2022.

NVIDIA Tesla P40 and Tesla P4 Specifications:

Product Proper name	Tesla M4	Tesla M40	Tesla P4	Tesla P40
GPU Compages	Maxwell GM206	Maxwell GM200	Pascal GP104	Pascal GP102
GPU Procedure	28nm	28nm	16nm FinFET	16nm FinFET
CUDA Cores	1280 CUDA	3072 CUDA	2560 CUDA	3840 CUDA
Clock Speed	1072 MHz	1114 MHz	1063 MHz	1531 MHz
FP32 Compute	two.20 TFLOPs	7.00 TFLOPs	v.l TFLOPs	12.0 TFLOPs
INT8 Compute	Northward/A	N/A	22 DLTOPs	47 DLTOPs
VRAM	4 GB GDDR5	24 GB GDDR5	viii GB GDDR5	24 GB GDDR5
Memory Clock	five.5 GHz	6.0 GHz	half-dozen.0 GHz	7.2 GHz
Memory Autobus	128-chip	384-flake	256-bit	384-bit
Memory Bandwidth	88.0 GB/s	288.0 GB/s	192.0 GB/s	346 GB/s
TDP	~75W	250W	~75W	250W
Launch	2015	2015	2016	2016

Software Tools for Faster Inferencing

Complementing the Tesla P4 and P40 are two software innovations to advance AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.

TensorRT is a library created for optimizing deep learning models for production deployment that delivers instant responsiveness for the almost complex networks. It maximizes throughput and efficiency of deep learning applications by taking trained neural nets — defined with 32-scrap or xvi-chip operations — and optimizing them for reduced precision INT8 operations.

NVIDIA DeepStream SDK taps into the power of a Pascal server to simultaneously decode and clarify upwards to 93 Hd video streams in real time compared with seven streams with dual CPUs. This addresses one of the one thousand challenges of AI: understanding video content at-scale for applications such as self-driving cars, interactive robots, filtering and ad placement. Integrating deep learning into video applications allows companies to offer smart, innovative video services that were previously impossible to deliver.

NVIDIA Offers 10W, Palm-Sized Energy-Efficient AI Computer for Cocky-Driving Cars

NVIDIA too announced a new Drive PX 2 lath for cocky driving cars. While the original design uses 2 Parker SOCs, the new model is a single fleck based blueprint. With a TDP of just 10W and a much smaller board footprint, the AI supercomputer adds more affordability to the product.

"Baidu and NVIDIA are leveraging our AI skills together to create a deject-to-car arrangement for self-driving," said Liu Jun, vice president of Baidu. "The new, modest form-factor DRIVE PX 2 will be used in Baidu'south HD map-based self-driving solution for car manufacturers." via NVIDIA

The new single-processor DRIVE PX two volition be available to production partners in the fourth quarter of 2022. DriveWorks software and the Drive PX two configuration with 2 SoCs and two discrete GPUs are available today for developers working on autonomous vehicles.