NVIDIA Pascal GPU Architecture Preview: Inside The GP100


NVIDIA Pascal GPU Architecture (Cont.)

There’s more to the Tesla P100 and GP100 GPU than what’s going on inside the chip. The GP100 will feature 16GB of Chip on Wafer on Substrate (CoWoS) HBM2, support for NVLink, and it uses a new board / connector design.

p100 hgm2
Chip on Wafer on Substrate (CoWoS) HBM2 On GP100

HBM2 is fundamentally similar to HBM, which is used on AMD’s Fury line of graphics cards. For a more detailed explanation of HBM, please point your browser here. With HBM2 though, data rates are doubled and higher-capacities are supported, along with some other underlying tweaks and enhancement to the technology to improve signalling and reduce error rates. NVIDIA points out the ECC is "free" with HBM2 on the GP100.

nv link

NVLink is something NVIDIA has been talking about for some time. In fact, we first mentioned it back during our GTC’14 coverage. With NVLink, 160GB/s of bi-directional interconnect bandwidth is available between GP100 GPUs (when in two fully connected quads, connected at the corners – as shown in the diagram), and up to eight GP100-based Tesla P100 boards can be interconnected. Should only four GP100s be linked in a single quad, however, NVLink can offer 120 GB/s per GPU (bidirectional) for peer traffic and 40 GB/s per GPU bidirectional to CPU.

NVLink is a serial interconnect technology that employs differential signaling with embedded clocks, a technology that has been around for years. The technology also allows for unified memory architectures and cache coherency. NVLink is similar to PCI Express in terms of command set and programming model, but NVLink offers significantly more bandwidth, with better bandwidth utilization. NVIDIA is claiming up to 94% bandwidth efficiency with NVLink.

pascal unified memory

NVIDIA’s Pascal also features a unified memory architecture. The GPU has a page migration engine with support for Virtual Memory Demand Paging. It has 49-bit Virtual Addresses, which are able to cover 48-bit CPU address in addition to all GPU memory. There is also support for GPU page faulting, and NVIDIA says the GPU can handle “thousands of simultaneous page faults”. Finally, the GPU supports up to a 2MB page size, with better TLB (Translation Look-Aside Buffer) coverage of GPU memory as well.

While Kepler and Maxwell also had support for unified memory, they were limited to a memory space equal to the size of available GPU memory. With Pascal, though, that limitation is gone. The GP100 can allocate memory beyond the GPU memory amount, and up to the total amount of available system memory.

We should learn more about Pascal and additional GPUs that use the architecture in the near future. For now though, it appears that NVIDIA has a powerful base GPU architecture on its hands for the HPC market. When or if the GP100 trickles down into a consumer product remains to be seen, but if history is an indicator, it will arrive in some form and fill the space currently occupied by the Titan X.

Stay tuned to HotHardware in the months ahead as we learn more about the GeForce side of Pascal. 

Tags:  Nvidia, GPU, Tesla, P100, pascal, gp100

Related content