At its Fusion Development Summit this week, AMD discussed the concepts and capabilities it's targeting for future generations of AMD graphics cards. The company isn't sharing any specific architectural features, but even the general information it handed out is interesting.
Demers began by talking about the history of ATI's graphics card and the evolution of the company's GPU design. When ATI originally designed its DX9 hardware, it designed its vertex shaders to use a VLIW5 (Very Long Instruction Word) implementation. Back in such halcyon days of yore, Nvidia's programmable G80 was scarcely a twinkle in David Kirk's eye. Pixel and vertex shaders were two different animals. ATI's VLIW5 approach allowed the vertex shader to handle four simple operations, with a dedicated fifth unit that could handle more complex tasks.
AMD stuck with a VLIW5 approach until it launched Cayman last year. With Cayman, AMD adopted VLIW4. Instead of four simple operations with an option for a complex 5th, VLIW4 uses three of its four simple units to perform a complex operation. This increased design efficiency and allowed AMD to increase the number of SIMD blocks per die. Future discrete GPUs (and Fusion products) will be built around what AMD is calling a compute unit (CU). AMD's video explores the difference between the new CU architecture and the VLIW5/VLIW4 designs that came before it in some detail; we've kept our own discussion at a fairly high level.
AMD's long-term Fusion roadmap
Demers made it clear that while gaming and pure graphics performance remain important to AMD, many of the improvements the company is planning to introduce are aimed at boosting GPU compute performance. Nvidia took the same general approach when it built Fermi. Granted, the GF100 had more than its share of growing pains, but it proved that its possible to improve gaming and GPU compute capabilities at the same time without sacrificing one for the other.
AMD's long term goal is to integrate the GPU and CPU into a single cohesive unit. Future Fusion parts (and discrete GPUs) will add support for C, C++, and other high-level languages. Eventually, CPU and GPU will share x86 virtual memory (64 bit x86 pointers will be understood by the GPU). The GPU will also have its own address translation caches, cache coherence between the two parts will be maintained, and all of these features will be accessible if a discrete GPU is paired with a traditional CPU.
The new compute unit structure.
It'll be a year or more until we see a Fusion chip or discrete GPU that leverages these new technologies; AMD has stated that the first generation of Bulldozer chips that integrate GPUs will *not* use the tech we've been discussing. It's also noteworthy that many of the technologies AMD has adopted here first showed up in Fermi. Then again, AMD's own presentation indirectly addresses this issue. Up until now, AMD has focused on first matching, then challenging NV's game performance. If gaming were the only important issue, the company's VLIW4 and VLIW5 designs would've been sufficient. AMD is moving to adopt certain technologies precisely because it wants to compete in the GPGPU / HPC markets.