Qualcomm Centriq 2400 Server Chip Takes On Intel Xeon In Cloudflare Benchmark Gauntlet

centriq wafer
Yesterday, Qualcomm announced that its Centriq 2400 Series processors are now shipping to its customers, nearly a year after it began sampling the chips. Available in 40-, 46- and 48-core versions, the Qualcomm is positioning Centriq 2400 as a worthy competitor to Intel's Xeon processors in the data center server market.

Up until this point, Qualcomm has contended that the 48-core Centriq 2460 offers a 4x improvement in performance-per-dollars versus the Intel Xeon Platinum 8180. While those figures are much appreciated, we wanted to see some real world benchmark data so that we can make our own conclusions. Thankfully, the folks at Cloudflare were able to put a 46-core Centriq 2452 up against comparable Xeon processors from Intel. In the Intel corner, we have the Grantley platform (Broadwell) using two 10-core Xeon processors with Hyper-Threading enabled (40 cores) and Purley (Skylake) using two 12-core Xeon processors with Hyper-Threading (48 cores).

centriq 2400

As an aside, HotHardware uses Cloudflare for both its Content Delivery Network (CDN) caching and SSL/HTTPS connection acceleration. So naturally, any Cloudflare-generated performance measurements are pertinent to us. Also consider that Cloudflare is an excellent reference point for Centriq performance since the processor is specifically geared towards high-density cloud data center rack servers.

With Cloudflare's current upgrade cycle, the company has already purcahsed a "significant number" of Xeon Silver 4116 processors to use in a dual-socket configuration. However, it is very open to embracing alternatives, which is the reason behind the Qualcomm benchmark investigation you can see below.

CPUs specs

But before we present the results, we should preface that the Centriq 2452 was tested, and not the top-end, 48-core Centriq 2460. Secondly, given the differences in architecture (ARM for Centriq, x86 for the Xeon processors), there are a lot of things to consider with regards to how these processors will eventually be configured for a given application. For example, we you could squeeze 48 Centriq cores into a single socket blade server, while it would take two Xeon sockets to get that many cores. However, you could also drop in a 4-socket Intel setup with enough board/chassis/rack real estate (at the expense of greater power consumption and heat).  There are many ways to look at it, but a single-socket 48-core ARM chip looks fairly compelling for high density, power-sensitive applications.

pub key all core 2

For those not in the know, OpenSSL is an on-the-fly public key used for HTTPS encryptions. For the multi-core benchmarks, as we see the Falkor cores in the Centriq 2460 really flex their muscle. With the exception of a slight victory for Skylake in the RSA2048 signature, the Qualcomm chip takes the gold in all the other benchmarks.

"At the SoC level, Falkor wins big time," writes Cloudflare's Vlad Krasnov. "It is only marginally slower than Skylake at an RSA2048 signature, and only because RSA2048 does not have an optimized implementation for ARM. The ECDSA performance is ridiculously fast. A single Centriq chip can satisfy the ECDSA needs of almost any company in the world."

gzip all core
brot all core

When it comes to on-the-fly dynamic and static web content compression like the industry-standard GZip or Googles more recent Brotli algorithm, in both benchmarks, Centriq really shows its muscle dominating across the board with Gzip, and winning nearly every test with Brotli compression.

Where Centriq and its Falkor cores really fell short, however, is with Regexp (Regular Expression Matching), which is a function that employs text pattern matching for search and replace operations. Regexp.Match easily proved to be a particularly tough challenge, with Skylake showing a 4x advantage.

go regexp easy all core
go regexp comp all core

"Doing some profiling shows that a lot of the time is spent in the function bytes.IndexByte," wrote Cloudflare engineer Vlad Krasnov. "This function has an assembly implementation for amd64 (runtime.indexbytebody), but generic implementation for Go. The easy regexp tests spend most of time in this function, which explains the even wider gap."

In the end, Cloudflare feels that Qualcomm has a potential winner on its hands, especially with regards to multithreaded workloads. Intel's position of strength in the server market allows it to charge a premium for its Xeon processors, but that is countered by the lower cost and low power consumption for Centriq.

"The largest win by far for Falkor is the low power consumption. Although it has a TDP of 120W, during my tests it never went above 89W (for the go benchmark), said Krasnov. "In comparison Skylake and Broadwell both went over 160W, while the TDP of the two CPUs is 170W."

Cloudflare sounds fairly confident that Qualcomm Centriq will only get better with time as more applications are optimized to take advantage of its ARMv8 64-bit architecture.