Temasha and Kabini Architectures
Kabini and Temash were designed from the ground up to improve overall performance and power efficiency over previous-generation products. The APUs will be offered in dual- and quad-core varieties and are manufactured using TSMC’s 28nm process node. Unfortunately, as of this writing, AMD hasn’t disclosed exact transistor counts and die sizes, but we should have that information soon.
AMD’s previous-gen Brazos-based products featured the company’s Bobcat CPU core design. The Jaguar cores in Temash and Kabini improve on Bobcat with better IPC performance, the ability to run at higher frequencies at a given voltage, and improved power efficiency though finer-grained clock and power gating and unit redesigns. AMD wanted to preserve the throughput of Bobcat and save on power but ultimately ended up with a much higher-performing part (relatively speaking), as well.
Jaguar adds support for SSE4.1, SSE4.2, AES, CLMUL, MOVBE, AVX, XSAVE/XSAVEOPT, F16C, BMI1, and has a 40-bit physical address space. The cores feature improved instruction cache prefetching, too. AMD tells us they grew the instruction buffer and added about 30% additional die area over what they had with Bobcat. They’ve also added a divider to the integer unit (a minor modification from the unit in Llano) and added a pipeline stage (a decode stage), which allowed them to boost frequencies. A pipeline stage was added to the FPU as well, again for better frequency response at lower voltages.
The new Jaguar cores in Kabini and Temash are also outfitted with enhanced Out-of-Order resources, including a redesigned scheduler, and buffers that are 30% - 70% larger than Bobcat. The FPU was totally redesigned, and increases from 64-bits to 128-bits wide. And the Load Store Unit and Data Cache have redesigned queues, with a matrix-style picker and store data FIFO.
With Jaguar, the 16-way set associative L2 is also a shared cache among all the cores. With Bobcat, 512K was allocated to each core. With Jaguar though, all 2MB is shared, and cache dynamically reallocates to threads that need it. The L2 redesign is where a large part of the IPC improvements over Bobcat come from. AMD claims up to a 22% IPC improvement in single threaded workloads, clock for clock, or 15% if you restrict a Jaguar core to the same size cache as Bobcat. The L2 cache redesign helps IPC because of the shared resources. If only one to two cores are lit up, they have access to much more cache than Bobcat.
Moving on from the cores, Kabini and Temash feature an integrated 64-bit wide memory controller / Northbridge, with official support for frequencies up to DDR3-1600. The memory controller also supports 1.25v, 1.35, and 1.5v DIMMs. Also present is a Fusion Controller Link, or FCL, which is how the IO subsystems interface with the on-die Northbridge and allows the CPU to access the GPU frame buffer (and vice versa). The FCL is 128-bits in each direction, while the graphics memory bus is 256-bit in each direction.
As we’ve mentioned, Kabini and Temash feature Radeon HD 8000-series DX11.1 graphics cores based on AMD’s Graphics Core Next architecture. There are 128 Radeon Cores on board, which offer up to a 75% performance improvement over the previous-gen graphics core used in Hondo.
AMD also offers a new technology they're calling "Turbo Dock", which can boost performance by up to 40% by enhancing cooling performance and supplying more power when a tablet or convertible device is docked, but we haven't seen it in action just yet.