Intel's Knights Landing Xeon Phi Will Target 3TFLOPs, Offer 16GB of RAM
RealWorldTech has published an expose on the upcoming architecture, blending what we know of the new design with some intelligent speculation about its overall structure and capabilities. Knights Landing will be based on Intel's Silvermont CPU architecture, which is currently used for the company's Bay Trail mobile products. Unlike Silvermont, however, KL will integrate support for 512-bit AVX operations and a new mesh interconnect architecture for connecting the 72 cores expected to ship on each PCB.
Original image courtesy of VR-Zone
Those 72 cores will be connected to up to 16GB of on-package eDRAM and six DDR4 memory controllers capable of addressing up to 384GB of DDR4 memory. Exactly how much DDR4 will be offered on the card is still unclear, as is the precise relationship between the 8-16GB of EDRAM and the larger DDR4 pool. Further speculation is that Intel may connect Knights Landing to its upcoming Skylake-EX chipset using QPI rather than PCI-Express 3.0 due to latency concerns. Not only is PCI-E 3.0 relatively limited for bandwidth by comparison, but full trip latency is roughly one microsecond, compared to about 40ns for QPI.
The expectation is that Knights Landing will fundamentally change the interconnect structure of the chip, with a cache design overhaul and an emphasis on minimizing cache thrashing that can occur when you have tiled architectures this large attempting to share data. The entire design should still fit within a 300W TDP, but offers a substantial improvement GFLOP/W, from 4-6 GFLOP/W in the old Knights Corner to 14-16 GFLOPS/W for Knights Landing.
By the time that Knights Landing arrives in 2015, Nvidia should have brought at least one more generation of GPU to the HPC market as well, and possibly two. That means the combat between Intel and Nvidia in this space should be intense, particularly given its lucrative nature. One reason Kanter thinks Intel is likely to go with QPI over PCIe 3.0 wherever it can (in the long term) is because Nvidia doesn't have a license for quickly and easily tying their HPC acceleration hardware into the rest of the system. That kind of custom hardware implementation makes no sense in consumer hardware, but server parts carry sufficient margin to justify the expense.