3D V-Cache Helps Deliver Up To A 12 Percent Performance Lift For AMD EPYC Milan-X
by
Zak Killian
—
Saturday, January 22, 2022, 02:52 PM EDT
When AMD announced the Ryzen 7 5800X3D at CES 2022, a lot of people were surprised that the company appears to only be launching a single Ryzen SKU with its staked 3D V-Cache technology. That's partially because the company had already demonstrated 3D V-Cache using a Ryzen 9 5950X, and partially because we know AMD can ship CPUs with both multiple Core-Complex Dice (CCDs) and 3D V-Cache. After all, that's what EPYC Milan-X is.
Announced back in November, EPYC Milan-X, or more formally, "AMD 3rd-Gen EPYC Processors with 3D V-Cache", slaps a 64-megabyte vertically-stacked L3 cache die on each of Milan's eight CCDs, giving the CPU 768 MB of L3 cache, totaling a whopping 804MB of cache per socket. AMD made some big performance claims when announcing these CPUs, but skeptics wondered about the clock and latency implications of all that die-stacking.
Well, tech blog Chips and Cheese got its hands on a virtual instance running on a Milan-X server, and ran some benchmarks. The results may not be as impressive as we might have hoped, but they do paint a promising picture for the future of 3D V-Cache.
Much like with the Ryzen 7 5800X3D, stacking on extra cache does seem to necessitate a small drop in CPU core clock. Article author Cheese notes that due to the virtualized nature of the testing, they can't retrieve the actual clock rate, but based on L1 cache bandwidth, Cheese speculates that Milan-X clocks about 5% slower than the non-stacked version.
That clock rate deficiency shows up in the OpenSSL benchmark, where the V-Cache-equipped EPYC 7V73X loses to the EPYC 7763 by about 2%. As Cheese notes, OpenSSL doesn't stress the caches at all, so this result is completely expected. In other benchmarks, like in Gem5 Compile and 7-Zip Compression, the Milan-X part comes out ahead by 5-7%.
Keeping in mind that clock rate deficiency, we come out to a maximum of around 12.5% performance increase (in the Gem5 compile benchmark, pictured above) at the same clock rate for Milan-X, compared to Milan. That's a far cry less than the massive 50% that AMD was claiming, but of course, Cheese didn't test any of the cache-bound things where AMD was claiming such a benefit.
There's a lot more information in the article, including an in-depth comparison of cache latency and bandwidth between both Milan and Milan-X as well as Intel's Ice Lake- and Cascade Lake-based Xeons. Much of it flies over our heads, but if you'd like to get into the gritty details, head over to Chips and Cheese to read their analysis.