AMD EPYC 7002 Series Zen 2 Architecture Doubles Data Center Performance And Density

AMD EPYC 7002 Processors: Additional Architectural Details

The amount of total cache in Zen 2 has also been increased significantly. The L3 cache size has been doubled, up to 32MB per CPU die (16MB per quad-core CCX), which equates to a massive 256MB of L3 in a fully-enabled 64-core EPYC 7002 processor, as you go down the stack and reduce cores, that total amount of cache is obviously reduced. The larger L3 cache drops effective memory latency, by reducing the number of calls out to main system memory and keeping more data as close to the CPU cores as possible.

numa advancements epyc 7002

The new multi-die approach with separate IO die also reduces the complexity of the NUMA configurations with EPYC 7002 series. There are now only two NUMA distances and two NUMA domains with EPYC 7002 series processors, versus three distances and eight domains with first-gen EPYC. The latency out to NUMA distances 1 and 2 are somewhat higher than the first-gen EPYC 7001 series, because the memory was directly attached to the CPU dies in those processors, but the overall effective latency to memory is reduced because of the elimination of those additional NUMA domains. In first-gen parts, calls to memory sometimes had to make multiple hops through CPU dies, before hitting main memory – EPYC 7002 reduces some of that complexity.

memory speed epyc 7002

EPYC 7002 series processors also offer support for faster memory speeds to further increase bandwidth and reduce latency. Whereas the previous generation topped out at 2933MHz, EPYC 7002 can utilize 3200MHz memory. In a 16-channel memory configuration, that works out to about 410GB/s of peak bandwidth – the originals topped out at 340GB/s.

infinity fabric epyc 7002

AMD’s Infinity Fabric links all of the compute dies and the IO die together, but also links the sockets in an EPYC 7002 system. The Infinity Fabric configuration has been enhanced over the previous gen, however. EPYC 7001 series processors could transfer 16 bytes per fabric clock tick, but each CCD in an EPYC 7002 can read 32 bytes or write 16 bytes per fabric clock now. The links between sockets provide up to 18GT/s over the Infinity Fabric, with up to four socket-to-socket links. And those links are efficiently utilized to transfer commands, data, and CRC info over the same x16 link.

io epyc 7002

All of the PCI Express lanes available with EPYC 7002 processors are PCIe Gen 4 ready, which effectively doubles the available bandwidth per lane versus first-gen EPYC processors. There are eight x16 links available per CPU, and all links support bifurcation, with a maximum of 8 devices supported per x16 link. Because some of the links are configurable and may be used between processors / sockets or as IO, the EPYC 7002 series platform supports up to 162 lanes of PCIe connectivity in a two socket configuration. There are 128 lanes available in a single-socket setup.

small security epyc 7002

Security has been a major concern as of late, and AMD was quick to point out that some of its architecture decisions have protected its processors from being affected by a number of recent speculative exploits. AMD processors also feature a separate ‘AMD Secure Processor’ embedded in the SoC, which is basically a 32-bit microcontroller based on ARM’s Coretex-A5 based. 32-bit micro-controller ARM Cortex-A5. The AMD Secure Processor has its own secure ROM and RAM area and operates autonomously, handling security processing requests, though it does have access to system memory and other on-chip resources.

small security epyc 7002 2

sme sve epyc 7002

The Secure Processor allows for Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV). With SEV, each guest can have its own encryption key, which allows guests to be isolated from each other and also from the hypervisor. Support for 509 unique keys is available.

Related content