AMD Kaveri APU Explained
AMD is introducing the concept of what it calls “Compute Cores” with Kaveri. The idea is that since Kaveri is the company’s first, true heterogeneous processor, and because its CPU and GPU cores can process data in their own context and virtual memory space, independent of each other when software is properly written to take advantage of the capability, that they’re deserving of a new moniker.
With the idea of Compute Cores in mind, AMD describes Kaveri as offering up to 12 compute cores, 4 CPU cores and 8 GPU cores. As we’ve mentioned, the four CPU cores use AMD’s latest Steamroller microarchitecture, which is the latest iteration of Bulldozer. And the eight GCN-based GPU cores are essentially the same as those used on the recently-released “Hawaii” GPU (Radeon R9 290 and R9 290X), but with the addition of support for coherent, shared unified memory.
The Steamroller CPU cores employed in Kaveri were designed to feed the cores faster, improve single-core / IPC performance over previous generations, and offer better performance per watt. AMD set out to achieve these goals by improving scheduling efficiency, branch prediction, and increasing the size of queues all around. According to AMD, with Steamroller, mispredicted branches have been improved by about 20%, scheduling efficiency by 5-10%, and i-Cache misses by up to 30%. Power efficiency improvements come by way of optimizations in “every part of the design” according to AMD and from a programmable on-die micro-controller that monitors virtually every part of the chip, and gates unused blocks as necessary.
The GCN GPU cores in Kaveri are configured in a 4x16 SIMD-16 array (each core has 64 stream processors), with up to 8 cores total, for a max of 512 shaders. Each GPU core has a branch and message unit and scheduler, a scalar unit (with 4K registers), four texture filtering units, 16 texture load/fetch units, and 4 x 64K vector registers, a 64K local data share, and 16K of L1 cache. While the peak 720MHz GPU clock may be lower than previous AMD APU’s, Kaveri’s wider GPU and more advanced architecture should more than make up for its frequency disadvantage, when provided with adequate memory bandwidth.
We should also point out that Kaveri features a number of accelerators on-die as well. Kaveri features AMD's VCE (Video Coding Engine), UVD, and support for TrueAudio. VCE 2 in Kaveri is similar to VCE 1 in Richland, but supports more video formats as well as 60GHz Wireless Display and a new display encode mode. Kaveri also has an updated Unified Video Decoder, which adds improved error resiliency versus the previous generation. And then there's TrueAudio support. You can read more about AMD's TrueAudio technology here.
With each new generation of APU, AMD has moved closer and closer to implementing all of the features of its HSA (Heterogeneous System Architecture). The final piece of the puzzle in Kaveri is the APU’s ability to allow both cores to have coherent access to virtualize memory. AMD also added system level atomics to allow for synchronizing workloads across the different cores.
The specific Kaveri-based APU we’ll be testing here today is the A8-7600. This quad CPU-core chips is outfitted with only 6 active GPU cores (384 stream processors) and has a default CPU frequency of 3.3GHz and max Turbo Core frequency of 3.8GHz, when configured for a 65w TDP. When configured for a 45w TDP, the CPU cores are clocked at 3.1 / 3.3GHz. The GPU is clocked at 720MHz.
Also note that all of AMD’s new Kaveri-based APUs require a new socket—FM2+. The APUs are compatible with existing chipsets, but socket FM2+ has a couple of additional pins. Previous-gen socket FM2 APUs will work in newer FM2+ motherboards, but Kaveri-based FM2+ APUs will not work older FM2 motherboards. That’s something to keep in mind if you were considering an upgrade of an existing AMD-based system.