Intel Xeon Scalable Debuts: Dual Xeon Platinum 8176 With 112 Threads Tested

We began our testing of a 2P Intel Xeon Platinum 8176-based server with SiSoftware's SANDRA 2017, the System ANalyzer, Diagnostic and Reporting Assistant. We ran five of the built-in sub-system tests that partially comprise the SANDRA suite (CPU Arithmetic, Multimedia, Memory Bandwidth, Cache and Memory, and Multi-Core Efficiency). In addition to the sub-system tests, we also ran some of the multi-threaded benchmarks available within SANDRA, including the Financial Analysis, Cryptography, and Image Analysis tests.

All of the scores reported below were taken with the processors running at their default frequencies (2.1GHz base) with 384GB of DDR4-2666 RAM.

Preliminary Testing with SiSoft SANDRA 2017
Synthetic Benchmarks
san cpu
Processor Arithmetic
 
san mm
Processor Multimedia
 
san mem
Memory Bandwidth
san cache
Cache & Memory

In the Processor Arithmetic and Multimedia benchmarks, the dual Xeon Platinum 8176 processors performed exceptionally well, and outpaced everything else in SANDRA's database. In the Memory Bandwidth and Cache and Memory tests, the Xeon Platinum 8176s also performed very well with aggregate memory bandwidth falling in the 151GB/s range. For reference, a previous gen E5-2697 v4 based server offered up 107GB/s - 111GB/s. Cache and memory performance specifically shows the Xeon Platinums with a slight advantage virtually across the board as well.

san ip
Image Processing
 
san fa
Financial Analysis
san crypto
Cryptography
san mem trans
Memory Transactions

In the multi-threaded Cryptography, Image Processing, and Financial Analysis benchmarks, the 2P Xeon Platinum 8176-based server outpaced everything in SANDRA's database by a wide margin. The memory transaction rate on Intel's new platform, however, trailed the previous generation.

san mce best
Multi-Core Efficiency (Best Case)
san mce worst
Multi-Core Efficiency (Worst Case)

These multi-core efficiency tests may be more interesting to many of you. This particular benchmark had some 2P AMD EPYC 7601 results available. We ran the test to show the best and worst-case options, to illustrate bandwidth between the lowest and highest-latency pair of cores.  As you can see, in the best case, the EPYC processors offer more total bandwidth, but the individual results are mixed. The Intel platform offers more bandwidth with the smaller data sets, while EPYC offers more with the larger data sets (save for one result). In the worst-case test, the results essentially flip, with EPYC offering much more bandwidth in the smaller data sets, before falling off a cliff.

AIDA64
Memory Read, Write, Copy Benchmarks

We also have a handful of CPU and memory benchmarks from AIDA64. Below are results with the dual-CPU Xeon Platinum 8176-based system in six multi-threaded CPU-related benchmarks and in the memory read, write, copy, and latency benchmarks...

aida cpu aes
CPU AES Encryption

aida cpu fp64 ray
FP64 Ray Tracing

aida cpu hash
CPU Hash

aida cpu photo
Image Analysis

aida cpu queen
CPU Queen
aida cpu zkub
CPU Zlib

AIDA64 did not properly detect the memory speed on the server (it was running at DDR4-2666) and the program threw a few warnings that it was not optimized for the platform, but that did not stop the 2P Xeon Platinum 8176-based setup from blowing past everything else in AIDA's database in every test.

aida mem copy
Memory Copy

aida mem latency
Memory Latency
aida mem read
Memory Read
aida mem write
Memory Write
The Xeon Platinum 8176-based server also took top honors in all of the AIDA memory benchmarks. The platform's additional memory channels, in addition to the higher-speed memory support, result in the highest bandwidth and lowest latency results.

Related content