Nvidia Launches Another Attack On Modern CPU Design
Dally's basic claim is that modern CPUs are held back by legacy design. That's not particularly controversial, but he doesn't stop there. Referring to modern CPUs, Dally says:
They have branch predictors that predict a branch every cycle whether the program branches or not -- that burns gobs of power. They reorder instructions to hide memory latency. That burns a lot of power. They carry along a [set of] legacy instructions that requires lots of interpretation. That burns a lot of power. They do speculative execution and execute code that they may not need and throw it away. All these things burn a lot of power.
The gobby power-burning elements Dally mentions are all part of what distinguishes a modern OoOE processor from a classic in-order design. When Intel designed the P6 architecture that first debuted with the Pentium Pro, it went with OoOE precisely because it offered a major performance leap over and above what the in-order, superscalar Pentium could deliver. Branch prediction, instruction re-ordering, and speculative execution are all vital elements of modern chip design. While they all consume power, they're scarcely the anchors Dally implies.
Nvidia's Fermi is an awesome number cruncher but that doesn't make it a good idea in every situation
Next up we hear that the sort of HPC applications where products like Tesla make sense foreshadow future consumer usage models: "HPC is, in many ways, an early adopter, because they run into problems sooner because they operate at a larger scale. But this applies completely to consumer applications as well as to server applications...I think over time, people will convert applications to parallel, and those parallel segments will be well-suited for GPUs."
As in May, Dally's arguments seem to assume we live in the magic world of Parallelysium where programmers and compilers can effortlessly translate serialized dependencies into latency-free independent calculations. Reality is not so kind. One of the major problems of Intel's Itanium is that it's historically been extremely difficult for compilers to extract sufficient parallelism to harness the CPU's capabilities. In highly-tuned workloads, Itanium is a monster. Outside such workloads, the chip begins to stumble badly. In GPU-friendly, bandwidth-limited workloads, Tesla is a monster, but that doesn't mean the entire IT industry should march towards the GPU compute model.
Intel has already issued a whitepaper comparing CPUs and GPU performance using optimized code; we expect the two companies will step up their rhetoric as Intel's MIC Knights Corner moves closer to launch. Even if the computer industry eventually shifts to the GPU-centric processing model Dally discusses, we're betting the transition won't even start to impact consumers for over a decade. It's been nearly eight years since AMD launched the first x86-64 Opteron processors and there's still a huge number of people running 32-bit operating systems. We'll definitely see more programs taking advantage of the GPU in the years ahead, but there will be no skipping merrily from one model to another.