AMD is claiming that it has designed
Zen with performance, throughput, and efficiency in mind. We already know about the 40% IPC performance uplift, but now have additional details on how AMD achieves that feat.
Zen will have newly-designed, higher-performance branch prediction and a micro-op cache for more efficient issuing of operations. The instruction schedule windows have been increased by 75% and issue-width and execution resources have been increased by 50%. The end result of these changes is higher single-threaded performance, through better instruction level parallelism.
The pre-fetcher in Zen is also vastly improved. There is 8MB of shared L3 cache on board now, a unified L2 cache for both instruction and data, and separate, low-latency L1 instruction and data caches. In addition to the new cache structure and enhanced pre-fetcher, the new arch offers up to 5x the cache bandwidth to the cores versus previous-gen processors.
The individual L1 caches (per core) are 64K (instruction) and 32K (data) and the unified, 8-way associative L2 cache is 512KB. The unified L3 cache is 16-way associative, and 32 bytes can be transferred between the cache elements per cycle.
Zen also features simultaneous multi-threading, or
SMT, similar to Intel’s
HyperThreading. To put it simply, SMT works by leveraging unused processor resources to execute two threads on a single, physical core. There are often “gaps in utilization”, as AMD puts it, in a high-performance processor core. SMT exploits those gaps to help execute additional threads.
In addition to the architectural enhancements to Zen, the core has been designed with 14nm
FinFET transistors in mind, to maximize transistor density and efficiency. We don’t have actual clocks and power consumption details to share just yet, but AMD is claiming decreased power and increased performance relative to processor frequency – which you’d expect jumping to a new, more advanced process. For those wondering, AMD is again enlisting Global Foundries to manufacture Zen-based processors and they're leveraging the same process node as the Polaris GPU architecture.
The low-power design methodologies AMD used when architecting Zen include extensive clock gating with multi-level regions throughout the chip. The L1 is a write-back cache, which reduces the number of write operations out to memory. And the large micro-op cache, stack engine, and support for move elimination all work to improve efficiency.
All told, AMD’s Zen-based processors should offer vastly improved performance over the previous generation, not only in terms of IPC, but overall throughput as well. Couple the performance improvements with the efficiency gains and AMD is claiming huge benefits in terms of energy efficiency per clock cycle.
And, of course, AMD isn’t stopping there. Lots of information is learned throughout the development cycle of a processor core, as it progresses from a simulation to an actual, physical piece of silicon. There are always elements that can be tweaked or optimized to further improve performance, efficiency, or both. As you would expect, AMD is already working on the next iteration of Zen. If history is an indicator, we expect to be hearing a lot about “Zen+” in the months ahead.
AMD Zen-based processor engineering samples are already out at channel partners and are being qualified as we speak. The final number of SKUs, their clock speeds, and specifications have not been determined just yet, though there will be various implementations of multi-core Zen CPUs coming to market over time. AMD will disclose those details closer to the actual launch. What we can surmise at this point, however, is that if all goes to plan, AMD's CPU and APU product lines should be in a much stronger position versus Intel in the coming months.