Logo   Banner   TopRight
TopUnder
Transparent
AMD Kaveri Arrives: A8-7600 APU Review
Transparent
Date: Jan 14, 2014
Section:Processors
Author: Marco Chiappetta
Transparent
Introduction and Specifications

Way back in 2006, after the ATI acquisition, AMD laid out its Fusion initiative and future plans to integrate a CPU and GPU onto the same processor die. The ultimate goal of Fusion was to seamlessly combine CPU and GPU resources into a single, cohesive compute engine equally adept at handling serial and highly parallel workloads. At the time, the idea seemed ambitious, but history has shown that more and more external resources have consistently been brought onto the CPU die (memory controllers, IO hub, etc.) and graphics would be no different.

Since then, both Intel and AMD have quite successfully combined CPU and GPU engines onto single chips, but up to this point they have tended to work as autonomous islands. The CPU and GPU share some resources, but not a single memory pool. That changes today though, with the official launch of AMD’s Kaveri-based APUs, the first APUs to truly support heterogeneous computing.

We’ve been talking about Kaveri for quite some time and have quite a bit of content available related to today’s launch. We’d suggest checking out this link for a list of all of the Kaveri-related coverage we’ve got available. In this piece, we’ll finally be able to show you some hard numbers, using one of the more interesting SKUs in the initial AMD Kaveri-based product line-up - the AMD A8-7600 APU...


AMD A-Series APU In Socket FM2+ Flavor

AMD A-Series APU For 2014 "Kaveri"
Specifications & Features

Stream Processors   Up to 512 
Core Clock   Up to 3.7/4.0GHz 
Graphics Clock   Up to 720MHz 
Memory Support   Up to 2400MHz w/ AMP 
Typical TDP 

 45W, 65W, 95W
 Customizable via Configurable TDP 

Chipset Compatibility   A88X, A78, A55 
HSA Heterogeneous Computing   Yes 
AMD TrueAudio Technology   Yes 
API Support

 DirectX 11.2, Mantle 

 
Up to 4 “Steamroller” x86 computing cores
 
  • Support for the latest ISA instructions including FMA4/3, AVX, AES, XOP
  • Up to 2MB L2 cache per dual-core module (up to 4MB total)
  • Maximum Turbo Frequencies up to 4GHz

Up to 8 GCN-based GPU cores

  • Up to 512 shaders
  • Up to 720MHz
  • 8xAA and 16xAF Support
  • DirectX 11.2 Support
  • Mantle Support
  • AMD Eyefinity Technology2 and 4K Ultra HD Support
  • DisplayPort 1.2 Support

HSA Hetereogeneous Computing

  • hUMA – Heterogeneous Unified Memory Architecture enables shared memory between CPU and GPU cores
  • hQ – Heterogeneous Queuing allows both CPU and GPU cores to independently schedule tasks
 
FM2+ Platform

  • Backwards compatible platform means support for other FM2+ APUs new and old
  • PCI Express Gen 3 support
  • AMD CrossFire support with AMD A88X motherboards and above
  • AMD Memory Profile (AMP) support for up to DDR3-2400MHz
  • AMD Dual Graphics support with AMD Radeon R7 graphics cards

AMD TrueAudio Technology

  • Dedicated DSP for true-to-life audio with no performance compromise
  • Enable dynamic 3D sound processing effects across more audio channels
  • Programmable audio pipeline grants artistic freedom to game audio design

Unified Video Decoder and Compression Engine

  • Dedicated hardware to offload video encoding/decoding from CPU
  • AMD Picture Perfect support with HD Post-Processing technologies

Before we get to the benchmarks, we’ve got some features and specifications to share. As you can see in the table above (and as we’ve detailed many times in the past), Kaveri combines AMD’s latest Steamroller CPU microarchitecture with a GCN-based graphics engine, with up to 512 stream processors. Peak CPU clocks will vary depending on model, but the graphics clock maxes out at 720MHz. These AMD desktop-targeted APUs will carry TDPs of 45W – 95W.

AMD’s goal with Kaveri was to target virtually every market segment, from micro-servers all the way on up to high-end desktops. The architecture was designed with performance-per-watt in mind and some SKUs (like the A8-7600 we’ll be showing you here) offer scalable TDPs. The A8-7600 can actually be configured to operate with 45w or 65w TDPs.


AMD Kaveri Die Shot

Above is a funky Kaveri “die shot” that looks somewhat like a cross between a die map and blurred actual die shot. Regardless, it illustrates where all of the major chip components reside on the APU. Roughly 47% of Kaveri’s die (the large, orange/copper looking block) is dedicated to its GPU. Along the top of this image are the chip's PCI Express and display interface blocks, at the bottom (and the spike about in the middle) is the DDR3 phy and UNB, and to the right are the dual, dual-core X86 Steamroller CPU modules.

All told, Kaveri is comprised of approximately 2.41 billion transistors and has a die size of 245mm2. That’s about a billion more transistors than Richland, with roughly the same die area. AMD was able to accomplish this feat by working with Global Foundries on a new 28nm SHP manufacturing process, that’s less CPU-optimized but offered better area utilization for graphics. Using this 28nm SHP process, which employs bulk silicon, forced AMD to sacrifice some CPU frequency at the high-end of the TDP scale, but with better power characteristics at the sweet spot.
 

Transparent
AMD Kaveri APU Explained

AMD is introducing the concept of what it calls “Compute Cores” with Kaveri. The idea is that since Kaveri is the company’s first, true heterogeneous processor, and because its CPU and GPU cores can process data in their own context and virtual memory space, independent of each other when software is properly written to take advantage of the capability, that they’re deserving of a new moniker.

With the idea of Compute Cores in mind, AMD describes Kaveri as offering up to 12 compute cores, 4 CPU cores and 8 GPU cores. As we’ve mentioned, the four CPU cores use AMD’s latest Steamroller microarchitecture, which is the latest iteration of Bulldozer. And the eight GCN-based GPU cores are essentially the same as those used on the recently-released “Hawaii” GPU (Radeon R9 290 and R9 290X), but with the addition of support for coherent, shared unified memory.

The Steamroller CPU cores employed in Kaveri were designed to feed the cores faster, improve single-core / IPC performance over previous generations, and offer better performance per watt. AMD set out to achieve these goals by improving scheduling efficiency, branch prediction, and increasing the size of queues all around. According to AMD, with Steamroller, mispredicted branches have been improved by about 20%, scheduling efficiency by 5-10%, and i-Cache misses by up to 30%. Power efficiency improvements come by way of optimizations in “every part of the design” according to AMD and from a programmable on-die micro-controller that monitors virtually every part of the chip, and gates unused blocks as necessary.

The GCN GPU cores in Kaveri are configured in a 4x16 SIMD-16 array (each core has 64 stream processors), with up to 8 cores total, for a max of 512 shaders. Each GPU core has a branch and message unit and scheduler, a scalar unit (with 4K registers), four texture filtering units, 16 texture load/fetch units, and 4 x 64K vector registers, a 64K local data share, and 16K of L1 cache. While the peak 720MHz GPU clock may be lower than previous AMD APU’s, Kaveri’s wider GPU and more advanced architecture should more than make up for its frequency disadvantage, when provided with adequate memory bandwidth.

We should also point out that Kaveri features a number of accelerators on-die as well. Kaveri features AMD's VCE (Video Coding Engine), UVD, and support for TrueAudio. VCE 2 in Kaveri is similar to VCE 1 in Richland, but supports more video formats as well as 60GHz Wireless Display and a new display encode mode. Kaveri also has an updated Unified Video Decoder, which adds improved error resiliency versus the previous generation. And then there's TrueAudio support. You can read more about AMD's TrueAudio technology here.

With each new generation of APU, AMD has moved closer and closer to implementing all of the features of its HSA (Heterogeneous System Architecture). The final piece of the puzzle in Kaveri is the APU’s ability to allow both cores to have coherent access to virtualize memory. AMD also added system level atomics to allow for synchronizing workloads across the different cores.

  
AMD A8-7600 CPU-Z Details

The specific Kaveri-based APU we’ll be testing here today is the A8-7600. This quad CPU-core chips is outfitted with only 6 active GPU cores (384 stream processors) and has a default CPU frequency of 3.3GHz and max Turbo Core frequency of 3.8GHz, when configured for a 65w TDP. When configured for a 45w TDP, the CPU cores are clocked at 3.1 / 3.3GHz. The GPU is clocked at 720MHz.

Also note that all of AMD’s new Kaveri-based APUs require a new socket—FM2+. The APUs are compatible with existing chipsets, but socket FM2+ has a couple of additional pins. Previous-gen socket FM2 APUs will work in newer FM2+ motherboards, but Kaveri-based FM2+ APUs will not work older FM2 motherboards. That’s something to keep in mind if you were considering an upgrade of an existing AMD-based system.
 

Transparent
Test Setup and SiSoft SANDRA

Test System Configuration Notes: When configuring our test systems for this article, we first entered their respective system BIOSes or UEFIs and set each board to its "Optimized" or "High performance Defaults". We then saved the settings, re-entered the BIOS/UEFI, and set the memory speed to each platform's maximum, officially supported speed--DDR3-1866 in the case of the A8-6500T, DDR3-2133 for the A10-6800K and A8-7600, and DDR3-1600 on the Intel systems. The solid state drives were then formatted, and Windows 8.1 x64 was installed. When the Windows installation was complete, we fully updated the OS, and installed the drivers necessary for our components. We then installed all of our benchmarking software, performed a disk clean-up, cleared any prefetch and temp data, and ran the tests.


ASUS A88X-Pro Socket FM2+ Motherboard

AMD provided a slick system built around an ASRock mini-ITX motherboard and Xigmatek case for testing, to show the kind of small form factors the A8-7600 was suited for, but in the absence of a similar setup for the comparison systems, we tested AMD's latest APU in a full-sized ATX motherboard from ASUS, the A88X-Pro you see pictured here.

HotHardware's Test Systems
Intel and AMD - Head To Head

System 1:
AMD A10-6800K
(4.1GHz - Quad-Core)
AMD A8-6500T
(3.5GHz - Quad-Core)
AMD A8-7600
(3.8GHz/3.3GHz Quad-Core)

Asus A88X-Pro
(AMD A88 Chipset)

2x8GB AMD DDR3-2133
(@1866 with 6700) 

On-Processor Graphics
On-Board Ethernet
On-board Audio

Samsung SSD 840 Pro

Windows 8.1 x64
System 2:
Intel Core i5-4670K
(3.8GHz - Quad-Core)
Intel Core i3-4330
(3.5GHz - Dual-Core + HT)

Gigabyte Z87X-UD7 TH
(Z87 Express Chipset)

2x8GB AMD DDR3-2133
(@1600 with Haswell)

Intel HD 4600
On-Board Ethernet
On-board Audio

Samsung SSD 840 Pro

Windows 8.1 x64

SiSoft SANDRA 2014
System Level Benchmark

We began our testing with SiSoftware's SANDRA 2014 the System ANalyzer, Diagnostic and Reporting Assistant. We ran four of the built-in subsystem tests that partially comprise the SANDRA 2014 suite with AMD's new A8-7600 APU configured for a 65W TDP (CPU Arithmetic, Multimedia, Memory Bandwidth, and Cache Bandwidth). All of the scores reported below were taken with the processor running at its default clock speeds of 3.3GHz (3.8GHz Turbo) with 16GB of DDR3-2133 RAM running in dual-channel mode on an ASUS A88X-Pro motherboard.
 

Processor Arithmetic
 

Processor Multimedia

Memory Bandwidth
 

Cache and Memory
 

According to SiSoft SANDRA, the A8-7600 offered up 44.4GOPS in aggregate, which put it ahead of some Ivy Bridge-based mobile Core i5 processors, but well behind some higher-end Intel offerings. Memory bandwidth peaked in the 14.72GB/s range, which in right in-line with other dual-channel AMD solutions running at a similar clock. SANDRA's cache and memory bandwidth test, however, showed the A8-7600 offering up more bandwidth than other solutions once the dataset range exceeded the 4MB mark. With smaller datasets, the A8-7600 performed about in the middle of the pack in the test.
 

Transparent
Futuremark PCMark 8 v2

PCMark 8 v2 is the latest version in Futuremark’s series of popular PC benchmarking tools. It is designed to test the performance of all types of systems, from tablets to desktops. PCMark 8 offers five separate benchmark tests--plus battery life testing—to help consumers find the devices that offers the perfect combination of efficiency and performance for their particular use case. This latest version of the suite improve the Home, Creative and Work benchmarks with new tests using popular open source applications for image processing, video editing and spreadsheets. A wide variety of workloads have also been added to the Work benchmark to better reflect the way PCs are used in enterprise environments.

PCMark 8 v2
System Level Benchmarks 

These tests can be run with our without OpenCL acceleration. We chose to run with OpenCL acceleration enabled to leverage all of the platforms’ CPU and GPU compute resources…

In the Home and Work PCMark 8 v2 tests, AMD's new APU puts up a very good showing, besting the more expensive (and higher power) A10-6800K and the Intel Core i5 and Core i3 processors. In the Creative mode tests, however, Intel's higher performing CPU cores and speedy QuickSync video encoding engine, propel the Core i3 and i5 processors way ahead of the AMD-based offerings.

Our apologies for not having more reference systems in the chart above. This benchmark worked fine the first time we ran it on each platform, but decided to barf as soon as we swapped out an APU or CPU. Due to time constraints, we let fly with what we had, but would have liked to include numbers with the A8-7600 operating in its 45w TDP mode and from an A8-6500T.
 
 

Transparent
LAME MT and SunSpider

In our custom LAME MT MP3 encoding test, we convert a large WAV file to the MP3 format, which is a popular scenario that many end users work with on a day-to-day basis to provide portability and storage of their digital audio content. LAME is an open-source MP3 audio encoder that is used widely in a multitude of third party applications.

LAME MT
Audio Encoding

In this test, we created our own 223MB WAV file (a hallucinogenically-induced Grateful Dead jam) and converted it to the MP3 format using the multi-thread capable LAME MT application, in both single and multi-thread modes. Processing times are recorded below, listed in seconds. Shorter times equate to better performance.

 

Audio encoding with LAME MT is definitely not one of Kaveri's strong suits. The A8-7600 showed huge improvements over the A8-6500T, but Intel's parts simply dominate in this test due to their strong IPC advantages over AMD at this time and the A10-6800K was significantly faster too.

SunSpider JavaScript Benchmark
JavsScript Performance Testing

Next up, we have some numbers from the SunSpider JavaScript benchmark. According to the SunSpider website:

This benchmark tests the core JavaScript language only, not the DOM or other browser APIs. It is designed to compare different versions of the same browser, and different browsers to each other. Unlike many widely available JavaScript benchmarks, this test is:

Real World - This test mostly avoids microbenchmarks, and tries to focus on the kinds of actual problems developers solve with JavaScript today, and the problems they may want to tackle in the future as the language gets faster. This includes tests to generate a tagcloud from JSON input, a 3D raytracer, cryptography tests, code decompression, and many more examples. There are a few microbenchmarkish things, but they mostly represent real performance problems that developers have encountered.

Balanced - This test is balanced between different areas of the language and different types of code. It's not all math, all string processing, or all timing simple loops. In addition to having tests in many categories, the individual tests were balanced to take similar amounts of time on currently shipping versions of popular browsers.

Statistically Sound - One of the challenges of benchmarking is knowing how much noise you have in your measurements. This benchmark runs each test multiple times and determines an error range (technically, a 95% confidence interval). In addition, in comparison mode it tells you if you have enough data to determine if the difference is statistically significant.

All of the systems were tested using the latest version of Internet Explorer 9, with default browser settings, on a clean install of Windows 8.1 x64.

We saw a similar performance trend in the SunSpider benchmark. Whether operating with a 65w or 45w TDP, the new A8-7600 was markedly faster than the previous-gen A8-6500T, but the A10-6800K finished a notch ahead and Intel's processors were in a league of their own.
 

Transparent
Cinebench R15 and POV-Ray

Cinebench R15 is a 3D rendering performance test based on Cinema 4D from Maxon. Cinema 4D is a 3D rendering and animation suite used by animation houses and producers like Sony Animation and many others. It's very demanding of processor resources and is an excellent gauge of computational throughput.

Cinebench R11.5
3D Rendering

This is a multi-threaded, multi-processor aware benchmark that renders a photorealistic 3D scene (from the viral "No Keyframes" animation by AixSponza). This scene makes use of various algorithms to stress all available processor cores. The rate at which each test system was able to render the entire scene is represented in the graph below.

 

Once again, the A8-7600 had no trouble outpacing the A8-6500T. And depending on its TDP settings, the A8-7600 actually sandwiched the Haswell-based Core i3-4330. The A10-6800T and Core i5-4670K processors put up much better scores, however.

POV-Ray Performance
Ray Tracing

POV-Ray, or the Persistence of Vision Ray-Tracer, is an open source tool for creating realistically lit 3D graphics artwork. We tested with POV-Ray's standard 'one-CPU' and 'all-CPU' benchmarking tools on all of our test machines, and recorded the scores reported for each. Results are measured in pixels-per-second throughput; higher scores equate to better performance.

POV-Ray tells a somewhat different story than Cinebench. In this test, regardless of which TDP setting was used, the A8-7600 outpaced both the Core i3-4330 and A8-6500T and came within striking distance of the A10-6800K. The Core i5-7600K was the fastest of the bunch.
 

Transparent
Low-Res Gaming: Crysis and Bioshock

For our next set of tests, we moved on to some low-res benchmarking with Crysis (DirectX) and Bioshock Infinite (DirectX). In these tests, we drop the resolution to 1024x768 or 800x600, and reduce all of the in-game graphical options to their lowest values to minimize the load being placed on the GPUs and allow the CPUs to push frame rates as high as possible. However, the in-game effects, which control the level of detail for the games' physics engines and particle systems, which typically leverage the CPU, are left at their maximum values.

Low-Resolution Gaming: Crysis and Bioshock Infinite
Minimizing the GPU Load


The A8-7600 performed much like the A8-6500T in both of these game tests, due to its somewhat hobbled GPU and conservative CPU clocks. The A10-6800K, however, was clearly the fastest overall. Versus the Intel processors though, there's  simply no comparison--the A8-7600, regardless of its TDP configuration, smashed the Core i5-4670K and i3-4330.
 
Transparent
GPU Testing: 3DMark Fire Strike

Fire Strike has two benchmark modes: Normal mode runs at 1920x1080, while Extreme mode targets 2560x1440. GPU target frame buffer utilization for normal mode is 1GB and the benchmark uses tessellation, ambient occlusion, volume illumination, and a medium-quality depth of field filter. Normal mode is what we used for these tests.

Futuremark 3DMark Fire Strike
Synthetic DirectX Gaming

The more taxing Extreme mode, which we use for high-end graphics testing, targets 1.5GB of frame buffer memory and increases detail levels across the board. Extreme mode is explicitly designed for CrossFire / SLI systems. GT 1 focuses on geometry and illumination, with over 100 shadow casting spot lights, 140 non-shadow casting point lights, and 3.9 million vertices calculated for tessellation per frame. And 80 million pixels are processed per frame. GT2 emphasizes particles and GPU simulations. Tessellation volume is reduced to 2.6 million vertices and the number of pixels processed per frame rises to 170 million. 





There's no comparison here. According to 3DMark Fire Strike, the GPU performance offered by the A8-7600 clearly outpaces the previous-gen A10 and A8 processors and it smokes Haswell by a wide margin.
 

Transparent
GPU Testing: Cinebench OpenGL, Boishock

Cinebench R15’s GPU benchmark uses a 3D scene depicting a car chase, which measures the performance of a graphics card using OpenGL. The graphics card has to process a large amount of geometry (nearly 1 million polygons) and textures, as well as a variety of effects, such as bump maps, transparency, lighting and more. Results are reported in frames per second.

Cinebench R15 OpenGL Test
3D Rendering


The A8-7600 performed right about in-line with the A8-6500T in this test, but behind the A10-6800K by a significantly margin. The Intel processors, however, finished well behind the AMD APUs.

Bioshock Infinite
High-Res Test

For this next set of tests, we pit the integrated processor graphics incorporated into AMD's APUs against Intel's HD 4600 series engine--the most pervasive model available in desktop Haswell-based processors. We tested the game at its high-quality setting, at a resolution of 1920x1080, to put a significant strain on the various GPUs.

 
Once again, when the GPU is being taxed, AMD's APUs come out on top. At both TDP settings, the new A8-7600 outpaced every other configuration we tested, nearly doubling the performance of the Haswell-based Core processors.
 

Transparent
Power Consumption

Throughout all of our benchmarking and testing, we also monitored how much power our test systems consumed using a power meter. Our goal was to give you all an idea as to how much power each configuration used while idling and while under a heavy workload. Please keep in mind that we were testing total system power consumption at the outlet here, not just the power being drawn by the processors alone.

Total System Power Consumption
Tested At The Outlet

We built up a new test bed for our Haswell-based testing in preparation for this article, using a brand new Gigabyte Z87 Express based motherboard, that seems to draw WAY more power than we're used to seeing for Haswell, so we're not going to dwell on the Intel numbers here.

What's worth noting is that despite offering much better overall performance, the A8-7600's power consumption (when in 45w TDP mode) is only marginally higher than the A8-6500T, but significantly lower than the A10-6800K.
 
Transparent
Our Summary and Conclusion

Performance Summary: The AMD A8-7600 APU’s performance is somewhat of a mixed bag. Please keep in mind, however, that this APU is actually the low-end model in AMD’s initial Kaveri-based APU line-up. The A8-7600’s CPU cores are clocked lower than other models and its GPU is not outfitted with the full complement of stream processors (384 vs. 512) that will be available in the highest-end model, the A10-7850K.

With that said, the A8-7600 is a decent performer overall. Its Steamroller-based CPU cores do not do much to make up ground versus Intel’s processors, so in the more CPU-bound workloads, Intel’s dual-core Core i3-4330 is competes favorably to AMD’s quad-cores. And in terms of IPS and single-thread performance Intel maintains its huge lead. Factor graphics into the equation, however, and the tides turn completely. The GCN-based graphics engine in Kaveri is a major step-up over the previous-gen, and much more powerful than Intel’s mainstream offerings. The A8-7600’s power consumption characteristics are also more desirable versus the Richland-based A10-6800K.


The Initial AMD Kaveri-Based APU Line Up

AMD will initially be bringing three Kaveri-based desktop APUs to the market, A8-7600 we’ve shown you here (which can be configured for 45w or 65w TDPs), an A10-7700K, and a flagship A10-7850K. We hope to be able to show you the performance of the higher-end offerings soon enough, but for now the A8-7600 does enough to paint a decent picture. In the short term, Kaveri doesn't change the desktop landscape all that much. AMD has vastly improved its on-processor graphics performance and given the CPU cores a marginal bump in performance as well. In the future, applications written to leverage Kaveri's HSA and heterogeneous compute capabilities may alter performance proposition somewhat, but that remains to be seen. 


AMD Kaveri Product Positioning

The AMD A8-7600 will arrive at a price point of $119, sometime this quarter. The A10-7700K and A10-7850K should be available right away, however, at $152 and $173 price points, respectively. Though we can't speak definitively on the performance of the higher-end models yet, the A8-7600 seems to be priced aggressively. It is significantly less expensive than the Core i3-4330, despite offering competitive CPU performance in multi-threaded workloads and much better graphics performance.

Kaveri doesn't change the game for AMD today, but it is a major step forward for the company and lays the foundation for a number of future advances. If software developers get on board and leverage Kaveri's heterogeneous compute capabilities to their fullest potential though, then the future could be bright.

  • Great Graphics Performance
  • Improved Power Efficiency
  • Small CPU Performance Improvements
  • GCN and TrueAudio In An APU 
  • Four AMD Cores Not Always As Fast As Two Intel Cores
  • A8-7600 Not  Yet Available
  • New Socket



Content Property of HotHardware.com