NVIDIA Unveils Beastly Tesla P100 15 Billion Transistor, 16nm FinFET GPU With HBM2 And 21 TFLOPs Performance
NVIDIA just pulled the wraps off its fastest GPU and compute engine yet: the Tesla P100. This new GPU powerhouse is based on NVIDIA’s next generation Pascal architecture, which means that it’s built on a 16nm FinFET process. But even with thoroughly modern FinFET process tech, the Tesla P100’s die size measures 600mm2 due in part to the 15.3 billion transistors that comprise the new compute engine — nearly twice that of Maxwell, for reference. 
The specifications of the Tesla P100 are second to none, as double-precision performance is rated at 5.3 teraflops (TFLOPs), while single-precision performance comes at 10.6 TFLOPs. And as you can probably surmise from those figures, half-precision performance is listed at a heady 21.2 TFLOPs. 

NVIDIA CEO Jen-Hsun Huang Introduces The Tesla P100 At GTC 2016
The Tesla P100’s impressive specs sheet doesn’t stop there, however. You’ll also find 160GB/s of bi-directional interconnect bandwidth courtesy of NVLink (up to eight Tesla P100 GPUs can be interconnected with NVLink). Also along for the ride is 16GB of Chip on Wafer on Substrate (CoWoS) HBM2, offering 720GB/sec memory bandwidth. NVIDIA CEO Jen-Hsun Huang reports it takes some 4000 wires to interconnect the processing engine to it's HBM2 memory. 
All of this firepower combines to give the Tesla P100 12 times the neural network training performance of its Maxwell-based predecessor. NVIDIA also threw out a few other staggering performance stats including the fact that just one server node full of Tesla P100s can run Amber molecular dynamics code faster than 48 dual-socket CPU server nodes ( built on 12-core Intel E5-2680 v3 CPUs). In addition, the COSMO weather forecasting software might run just fine with 27 dual-socket CPU servers (16-core Intel E5-2698 v3), but a mere eight Tesla P100s can run the application faster.
As you can tell by the specs and the real-world performance examples that NVIDIA gave, this is a heavy-lifting product meant for deep learning, heavy-duty rendering and real-time rendering workloads as well as complex computational modeling.
"Our greatest scientific and technical challenges -- finding cures for cancer, understanding climate change, building intelligent machines -- require a near-infinite amount of computing performance," said NVIDIA CEO Jen-Hsun Huang. "[Pascal] represents a massive leap forward in computing performance and efficiency, and will help some of the smartest minds drive tomorrow's advances."
The NVIDIA Tesla P100 GPU will be available in DGX-1 deep learning system beginning in June and the chip itself is shipping in volume now to NVIDIA's first partner, notably Dell, HP, IBM and CRAY.




