Comprehensive Testing Illustrates AMD's Strengths, Weaknesses In Servers

AMD's Financial Analyst Day last week made it clear that while the company is primarily focused on consumer products and SoC design, it still wants to be a player in the server market. The various presentations were a bit unclear as to how AMD thought it would make that happen -- there was plenty of mention of "the cloud", but few details on how AMD could position its Bulldozer / Interlagos architecture to make a play in this space.

Extensive testing over at Anandtech has shed some light on the situation. The new article is a follow-up to the original story; combined, they offer a panoramic view of Sunnyvale's position in a wide range of HPC, virtualization, rendering, SQL, and SAP benchmarks. Testing at multiple load levels and measuring power consumption at each indicates that AMD has a competitive value proposition in certain workloads -- but it also exposes some of the chip's weaknesses.

AMD's decision to compete with Intel by adding more cores may have worked at first, but the company clearly overreached. One of the problems Interlagos has in certain workloads is that the chip spends a great deal of time in spin waits, meaning that threads are waiting on other threads to finish data accesses. Data contention increases as core counts rise; it's one of the fundamental problems that prevents multi-core designs from scaling effectively in the real world. With as many as 32 threads running on a 16 module Interlagos chip, spin waits kill Interlagos' performance in MySQL OLAP.

As an aside, Intel announced new extensions being added to Haswell this week that are meant to reduce the amount of time a chip spends in spin lock / spin waits by offering programmers a new memory model that can improve multi-core efficiency. The new extensions, dubbed TSX, will be backwards-compatible with existing approaches and debut with the launch of that processor in 2014.

For a benchmark-by-benchmark breakdown, we suggest you read the full articles, but there are some tests, like SAP, where Interlagos does well. The chip's HPC performance is also quite good. The tests also confirm something we'd previously suspected -- AMD's chip multi-threading (CMT) approach is executed well and scales effectively in server workloads where thread contention doesn't cripple overall performance. Bulldozer's problems are related more to the inherent lack of scaling in consumer applications and the speed of its caches.

Focusing on these workloads should help AMD claw back some market share, but Piledriver's performance will still be critical to any long-term growth. We continue to suspect that faster caches would go a long way to improving Interlagos' performance -- if AMD can deliver them, the chip may be able to compete effectively against the next round of Xeons, including the E5 series.