NVIDIA Tesla P100 Module, DGX-1 Supercomputer And QuantaPlex Deep Learning Server Spy Shots Expose Pascal

Yesterday, during his keynote address at GTC 2016, NVIDIA CEO Jen-Hsun Huang made a number of interesting announcements and disclosures. We saw Apple co-founder Steve “Woz” Wozniak take a virtual tour of Mars and witnessed the official unveiling of NVIDIA’s Tesla P100, which is based on the company’s bleeding-edge, GP100 GPU. GP100 leverages NVIDIA’s forward-looking Pascal GPU architecture and features 16GB of HBM2 memory, all built using TSMC’s 16nm FinFET manufacturing process. The GP100 is massive with a 600mm2 die, which is about the size of current gen high-end Maxwell GPUs, though when you consider it's built on 16nm process technology, it's obvious Tesla's compute resources are massive, in excess or 15 billion transistors to be exact.

So what does this new beast of a GPU look like? Here you go...

NVIDIA Pascal Tesla P100
The NVIDIA GP100 GPU Powering The Tesla P100

We’ve got the technical details and images of the Tesla P100 unveiling in our initial coverage, but have since been able to get up-close-and-personal with one of the powerful servers NVIDIA has coming down the pipeline – the NVIDIA DGX-1 -- that packs eight Tesla P100 GPUs inside. One of NVIDIA’s partners, QuantaPlex, was also on hand showing off one of its servers, and while the machine was disassembled, we were able to snap some high-res shots of the Tesla P100 boards inside.

QuantPlex Deep Learning Server
The QuantaPlex T21W-3U Server With Eight Tesla P100 Boards

Above is the QuantaPlex T21W-3U server. It’s configurable with up to eight Tesla P100 and uses Intel’s just-released Xeon E5-2600 v4 series processors.

The specifications of the NVIDIA Tesla P100 are also quite impressive. The GPUs double-precision compute performance comes in at 5.3 teraflops (TFLOPs), while single-precision performance lands at 2x that rate, or 10.6 TFLOPs. Half-precision performance doubles the data rate again, to 21.2 TFLOPs. Memory bandwidth is an impressive 720GB/s.


NVIDIA calls the DGX-1 a “supercomputer in a box” and “the world’s first purpose-built system for deep learning” due to its ability to be quickly deployed and accelerate the training time for deep learning applications at a rate up to 75x higher than a 2P Xeon E5-2697 v3 based server – at least according to NVIDIA’s estimates.

The DGX-1 On Display Also Had Eight Tesla P100s Inside

According to NVIDIA, the Tesla P100 and the DGX-1 deep learning server will be available for purchase sometime in June of this year. The Tesla P100 accelerator itself is already shipping in volume to NVIDIA's tier-1 partners like Dell, HP, IBM and CRAY.