AMD 4th Gen EPYC 9004 Series Launched: Genoa Tested In A Data Center Benchmark Gauntlet

AMD Genoa EPYC 9004 Series Server Workload Performance Testing

amd epyc genoa titanite
The Titanite reference platform is a fairly typical 19” 2U rack server. It is not the focus of our coverage, so we will not cover all of its details, but it is more than sufficient to gauge the performance of a dual socket configuration. This system is kitted out with one and a half terabytes of DDR5 across 24 DIMMs with a 500GB NVMe boot drive.

The stars of the show are a pair each of the EPYC 9654, 9554, and 9374F processors. The EPYC 9654 is the 96-core heavyweight while the EPYC 9554 is likely a better representation of what more customers will likely gravitate towards as a balance of cost and performance. The 9374F is oriented for HPC workloads with its frequency optimizations.

We would like to thank Wendell of Level1Techs for providing us access to a third generation EPYC server for comparison. This server uses dual 64-core EPYC 7773X processors. This is a Milan-X family processor which means it includes 3D V-Cache which can buoy certain memory and cache sensitive workloads.

For our gauntlet, we utilized the Phoronix Test Suite which is open-source and significantly simplifies Linux benchmarking. All of our numbers should be reproducible in your own environment, particularly for those evaluating a platform upgrade.

Coremark Multithreaded General Performance Testing

Without any further ado, we begin our testing with Coremark. Coremark is a very quick no-nonsense multi-threaded CPU test intended for quick comparisons.

epyc genoa coremark cpu

Core for core, we see a significant 1.6x uplift for the EPYC 9554 over its 64-core counterpart, the EPYC 7773X. This test does not stress cache in any particular way, so it is a very straight core-for-core comparison between generations. The frequency optimized EPYC 9374F scales well at about 56% of the EPYC 9554’s performance. On the other end, the 96-core processor does outpace the 64-core EPYC 9554, but only by about 1.3x despite having 1.5x the cores.

7-Zip Compression And Decompression Performance

Next, we looked at 7-Zip compression and decompression. The compression workload is influenced by memory and cache performance as well as out of order processing. Decompression is much more integer-driven, but also stresses the branch prediction pipeline.

epyc genoa 7 zip compression

epyc genoa 7 zip decompression

Despite the positive influence of cache size and speed on the compression workload, the Milan-X system performs relatively poorly. It is vastly outstripped by the EPYC 9554, and nearly overtaken by the EPYC 9374F with half its core count. In decompression, the Milan-X platform fares better, likely because the 3D V-Cache can help overcome branch prediction misses. Genoa still puts up a stronger showing overall though we only see about 15-20% performance scaling for the EPYC 9654 over the 9554.

Linux Kernel Compilation Performance

Software compiling is a common task, and building the Linux kernel itself has long been used as a performance benchmark. We tested with defconfig and allmodconfig with results reported in seconds.

epyc genoa build linux defconfig

epyc genoa build linux allmodconfig

With defconfig, the Genoa systems were generally faster than Milan-X, but per-core scaling is a mixed bag. The 64-core EPYC 9554 completed the build fastest, but only about 9% faster than the EPYC 9374F with half its core count. Milan-X failed to build with allmodconfig and we did not have sufficient time to troubleshoot it. Among the Genoa processors, the same overall trend is in place, but there is a 34% lead for the EPYC 9554 over the EPYC 9374F this time.

Blender 3D Rendering On Zen 4 AMD EPYC

Blender is a staple 3D rendering benchmark. We queued up the tried and true BMW scene and gauged the time to render in seconds.

epyc genoa blender bmw

Blender does not show any surprises. Impressively, the EPYC 9374F system completes its render just 1.5 seconds behind the Milan-X 7773X server, which is in turn almost 4.5 seconds slower than the EPYC 9554.

Embree 3D Rendering And Path Tracing Performance On AMD Genoa

Embree is a 3D pathtracing renderer which can leverage instruction sets like AVX2 and AVX512. The IPSC variant is compiled using the Intel Implicit SPMD Program Compiler which can see additional speedup when AVX acceleration is available.

epyc genoa embree pathtracer

epyc genoa embree pathtracer ipsc

In the default compiled render, the EPYC 7773X falls in line between the 32-core and 64-core Genoa processors. However, with IPSC the EPYC 7773X sees a performance regression while all the Genoa processors gain a substantial uplift. This sees the Milan-X server fall to the bottom of the ranking despite a 2x core advantage over the EPYC 9374F.

x264 Video Encoding Performance

Another common used of these servers is to serve as a render farm for video. We used the multithreaded x264 encoder with both 1080p and 4K test footage.

epyc genoa x264 1080p

epyc genoa x264 4k

Performance deltas are nearly non-existent between the Genoa systems at 1080p, but Milan-X trails by a fair margin. At 4K, there is more variance between the Genoa trio, though it is not clear why the EPYC 9554 lags here. Regardless, even the EPYC 9554 delivers 1.3x higher frame rates than the EPYC 7773X with the EPYC 9654 reaching 1.5x.

PostgreSQL Database Performance

PostgreSQL is very popular and provides us with a look at databasing performance. PGBench provides a rating of database transactions per second and corresponding average latency using both read-only and read-write workloads

epyc genoa postgresql read tps

epyc genoa postgresql read latency

epyc genoa postgresql readwrite tps

epyc genoa postgresql readwrite latency

The EPYC 9554 is very well balanced for this workload and delivers the highest transaction throughput with lowest latencies in both scenarios. Similarly, the 32-core EPYC 9374F continues to nip at the heals of the 64-core EPYC 7773X Milan-X system, though it is impacted more once write operations are introduced.

Apache Web Server Workload Testing

The Apache HTTP server benchmark simulates a number of concurrent clients submitting HTTP requests over a given timeframe and reports how many requests per second the server is able to handle.

epyc genoa apache http

This is the first test that tips in favor of the 32-core EPYC 9374F processor. Its higher core clocks are better able to deal with the bursty nature of each individual request. The EPYC 7773X only boosts up to 3.5GHz with a base clock of 2.2GHz and its performance trails significantly as a result.

TensorFlow AI Processing Workloads On AMD Genoa

Finally, we arrive at AI workloads. TensorFlow offers a few different models to analyze with, so we tested VGG-16, AlexNet, GoogLeNet, and ResNet-50.

epyc genoa tf vgg 16

epyc genoa tf alexnet

epyc genoa tf googlenet

epyc genoa tf resnet50

Interestingly, the EPYC 9374F again yielded the strongest performance, at least across AlexNet, GoogLeNet, and ResNet-50. Performance actually decreased as cores were added to the system in these cases. VGG-16 was best suited with the EPYC 9554 on a raw performance basis, but it is here that Milan-X chalked another DNF. Even when the EPYC 7773X was able to complete the other models, its performance lags significantly behind Genoa.

The addition of hardware accelerated AVX-512 is a massive boon for AI workloads on Genoa. Compared to the EPYC 7773X, the EPYC 9374F is 6.75x faster in AlexNet, and 4.68x faster in both GoogLeNet and ResNet-50. This is all without any other special optimizations mixed in.

AMD Genoa EPYC Final Thoughts And Conclusion

As it stands, Intel’s Xeon Max processors are a mostly unknown entity outside of the limited accelerator benchmarking the company has allowed on Sapphire Rapids. Until we can know more broadly on that front, we turn our analysis to the generational advances Genoa has made.

amd epyc genoa no lid

Our brief testing of Genoa shows sizeable improvements over Milan-X in most workloads, despite the 3D V-Cache advantage Milan-X holds. Core-for-core, the EPYC 9554 repeatedly out-performed the previous-gen EPYC 7773X without exception.

Arguably the most impressive showing came from the frequency optimized EPYC 9374F. This 32-core CPU was persistently at the heels of the EPYC 7773X throughout testing and left it in the dust once AI workloads came into the picture.

amd epyc genoa lga pads

AMD's new 96-core EPYC 9654 perhaps did not put up the kind of numbers some might expect from such a massive many-core chip, but that should not necessarily be the takeaway. The fact of the matter is that very few individual workloads can scale across all 384 threads. Instead, the EPYC 9654 will find its home in servers and data centers where it can be leveraged to tackle many different concurrent workloads at once from multiple tenants or use in hyperscaler environments especially.

We have to hand it to AMD for its generational gains with its new Zen 4 Genoa EPYC server chip architecture. Now it's time for Intel to give us the full monty, and allow us to pit Sapphire Rapids and Genoa in a head-to-head match-up. Either way, Genoa is a winner for customers who need its exceptional core density, but even its lighter core-count offerings provide significant advantages over third generation processors as proven by the EPYC 9374F versus EPYC 7773X dogfight in our testing.

Related content