AMD designed the
Zen microarchitecture at the heart of Ryzen with performance, throughput, and efficiency in mind. Initially, AMD had reported a 40% target for IPC (instructions per clock) improvement with Zen, but we know now that number was actually conservative, which is refreshing in this age of specsmanship. At its tech day event AMD claimed IPC
improvements of approximately 52%, but as we'll show you later, that number is even higher with some workloads.
One of the way's Ryzen processors, based on Zen, offer improved performance is through their newly-designed, higher-performance branch prediction algorithms and a micro-op caches, for more efficient issuing of operations. Processor instruction schedule windows have also been increased by 75% and issue-width and execution resources across the processor cores have been increased by 50% over the previous generation Excavator architecture. The result of these changes is vastly improved single-threaded performance, through better instruction level parallelism.
The pre-fetcher in Zen-based
AMD Ryzen processors is also vastly improved. There is 16MB of shared L3 cache on board now, a unified L2 cache for both instruction and data, and separate, low-latency L1 instruction and data caches. In addition to the new cache structure and enhanced pre-fetcher, the new architecture also offers up to 5x the cache bandwidth to the cores versus previous-gen AMD processors.
The individual L1 caches (per core) are 64K (instruction) and 32K (data) and the unified, 8-way associative L2 cache is 512KB. The unified L3 cache is 16-way associative, and 32 bytes can be transferred between the cache elements per clock cycle.
Zen also features simultaneous multi-threading, or
SMT, similar to Intel’s
HyperThreading. To put it simply, SMT works by leveraging unused processor resources to execute two threads on a single, physical core. There are often “gaps in utilization”, as AMD puts it, in a high-performance processor core. SMT exploits those gaps to help execute additional threads. AMD's SMT implementation proved to be quite effective and scaled well, as you'll see on some of the proceeding benchmark pages.
The low-power design methodologies AMD used when designing the Zen architecture include extensive clock gating with multi-level regions throughout the chip. The L1 is a write-back cache now, which reduces the number of write operations out to memory. And the large micro-op cache, stack engine, and support for move elimination, all work in tandem to further improve efficiency. Combine all of these architectural enhancements with the
GlobalFoundaries' 14nm process used to manufacture the chips, and you end up with processors that are far more efficient than anything to come out of AMD to date.
All told, AMD’s Zen-based processors offer vastly improved performance over the previous generation, not only in terms of IPC, but overall throughput as well. Couple these performance improvements with the enhanced efficiency of the processor cores and you end up with higher-performance, yet more power-efficient chips.
And,
AMD isn’t done yet with improving the architecture. Lots of information is learned throughout the development cycle of a processor core, as it progresses from a simulation to an actual, physical piece of silicon. There are always elements that can be tweaked or optimized to further improve performance, efficiency, or both. As you would expect, AMD is already working on the next iteration of
Zen. If history is an indicator, we expect to be hearing a lot about "Zen 2" and future iterations in the months ahead.