Since publishing this article, Intel has gone back and updated its blog post on Medium to clarify a few points raised by Kennedy. The most significant one is adding benchmark data from an updated version of GROMACS (2019.4), as that version is better optimized to leverage processors based on AMD's Zen 2 architecture. In doing so, Intel states it "found no material difference" to data it originally obtained.
"Intel is committed to always provide fair, transparent, and accurate performance results and would not intentionally mislead. We received feedback on our original blog and appreciate the community’s passion about performance and the accuracy of benchmarks. Taking the community’s feedback, we have updated this blog with data for the most recent GROMACS 2019.4 version and found no material difference to earlier data posted on 2019.3 version," Intel said in a statement.
Intel also communicated with Kennedy to clarify a few points of criticism he raised in his article. Notably, Intel said a section noting the number of threads used on AMD's hardware contained a typo—instead of running tests with one thread per core, "the tests were actually done with two threads per core," Kennedy says he was told by Intel.
Kennedy's updated blog post provides more details on Intel's clarification, which you can read here.
Intel is catching heat for posting a performance comparison pitting a pair of its Xeon 9282 server processors based on Cascade Lake-AP against rival AMD's EPYC 7742 Rome chips. That in and of itself is a fair thing to do, but some observers have raised their their eyebrows (and pitchforks) over the testing parameters used, which appear to unfairly favor Intel's silicon.
Before we dive into that, let's compared the specs. The Xeon 9282 wields 56 physical cores and 112 threads of brute computing muscle, with a 2.6GHz base clock and 3.8GHz boost clock. It also sports 77MB of L3 cache and has a 400W TDP.
AMD's EPYC 7742 boasts burlier specifications, as it totes 64 physical cores and 128 threads of computing power, with a 2.25GHz base clock and 3.4GHz max boost clock. It also features 256MB of L3 cache and support for PCIe 4.0, and has a lower 225W TDP.
On paper, the EPYC 7742 should thrash Intel's chip in workloads that are properly optimized to tap into multiple cores and threads. Indeed, Intel addresses this in a post on Medium, essentially saying more is not always better.
"Just like adding more people to a meeting does not always lead to greater productivity, 'more cores' will not always guarantee 'more performance'. Performance is a factor of many things, not just a single vector. More processor cores add compute, but overall system or workload performance depends on other factors," Intel says.
Intel goes on to state that the performance of each core, software optimizations leveraging specific instructions, memory bandwidth to keep all those cores properly fed, and cluster-level scaling all have an impact on performance that extends beyond the raw core and thread counts. And all of that is true. However, here is where things go awry.
Is Intel's Xeon Platinum 9282 Versus AMD's EPYC 7742 Benchmark Run Flawed?
Intel's accompanying analysis notes that several of the applications benchmarked take advantage of Intel's AVX-512 extensions, which increases the vector width, allowing applications to leverage more floating point operations per clock cycle. GROMACS is one of them.
That also happens to be the one that raised eyebrows the highest. Patrick Kennedy from Serve The Home took note of the benchmark comparison and blasted Intel's decision to use an older version of GROMACS, as outlined in the footnotes.
"Intel used GROMACS 2019.3. To be fair, they used the same version which makes it a valid test. GROMACS 2019.3 was released on June 14, 2019, just after the 2nd gen Intel Xeon Scalable series. On October 2, 2019 the GROMACS team released GROMACS 2019.4. Keep in mind that it is over a month before Intel published its article. In GROMACS 2019.4, there was a small, but very important fix for the comparison Intel was trying to show aptly called: Added AMD Zen 2 detection," Kennedy notes.
The newer version properly leverages Zen 2 processors and uses 256-bit wide AVX2 SIMD instructions by default. In addition, the release notes state "the non-bonded kernel parameters have been tuned for Zen 2," which "has a significant impact on performance."
In short, older versions of GROMACS did not properly support AMD's EPYC Rome series based on Zen 2. This makes for a lopsided comparison at best, and at worst, it's a cherry picked benchmark version intentionally meant to mislead the public—it's optimized for Intel's chips, but not for AMD's CPUs, even though there is a newer version that is optimized for both. Bear in mind Intel recently called out AMD for making benchmark comparisons that left out its NAMD optimizations, to show its EPYC processors in a better light.
Kennedy goes on to criticize other criteria Intel used.
"On both CPUs we see that there are two threads per CPU which means 56 cores/ 112 threads on the Platinum 9282 and 64 cores/ 128 threads on two AMD EPYC 7742 CPUs. Then things change. Turbo was enabled on the EPYC 7742, but not on the Xeon Platinum 9282. In GROMACS, transitions in and out of AVX-512 code can lead to differences in boost clocks which can impact performance," Kennedy notes.
He also called into question Intel's use of 16GB DIMMs for the Xeon test system and 32GB DIMMs for the AMD system, as well as the motherboards used and other details. It's an interesting read. The bigger question, though, is did Intel intentionally skew the testing it its favor? To be fair, GROMACS is a single benchmark out of several that were run. Kennedy is not willing to give Intel a pass, though.
"One can only conclude that Intel’s 'Performance at Intel' blog is not a reputable attempt to present factual information. It is simply a way for Intel to publish misinformation to the market in the hope that people do not do the diligence to see what is backing the claims," Kennedy said.
What are your thoughts on this? Let us know in the comments section below!