Logo   Banner   TopRight
TopUnder
Transparent
AMD Beema and Mullins Low Power 2014 APUs Tested
Transparent
Date: Apr 29, 2014
Section:Processors
Author: Marco Chiappetta
Transparent
Introduction To Beema and Mullins

A couple of weeks back, we got the chance to get some hands on time with AMD’s upcoming mainstream and low-power APUs (Accelerated Processing Units), codenamed Beema and Mullins. These APUs are the successors to last year’s Temash and Kabini APUs, which powered an array of small form factor and mobile platforms. With this release, however, AMD was laser focused on improving power consumption and efficiency, expanding the platform’s capabilities through both hardware and software tweaks, and of course improving performance over the previous generation.

Beema and Mullins are based on the same piece of silicon, but will target different market segments. Beema is the mainstream part that will find its way into affordable notebook, small form factor systems, and mobile devices. Mullins, however, is a much lower-power derivative, designed for tablets and convertible systems. Both Beema and Mullins feature multiple SKUs, with products in the E1, A4, A6, and A10 family of APUs.

AMD is announcing four Beema-based mainstream APUs today, with TDPs ranging from 10W – 15W. The A6-3610, A4-6210, and E2-6110 are quad-core parts with 2MB of L2 cache, though they’ll feature different max core clocks and DDR3 memory speeds. The E2-6010 is a dual-core part with only 1MB of L2.

There are three Mullins-based products being announced, two quad-cores and a dual-core. The top of the line-up is the A10 Micro-6700T. It’s a quad-core chip, with a max clock speed of 2.2GHz, 2MB of L2, and a TDP of only 4.5W. The A10-6700T is the chip we were actually able to test and have a full batch of benchmarks scores for on the pages ahead. The A4 Micro-6400T is also a quad-core part, but its max CPU frequency tops out at 1.6GHz. The E1 Micro-6100T is the lowest-power part with a TDP of only 3.95W, but unlike the others it is a dual-core chip.

One thing you may have noticed looking at both of the tables above is that all of the APUs based on Beema and Mullins feature 128 (GCN-based) Radeon cores, despite the fact that they have Radeon R2, R3, R4, and R6 branding. In lieu of altering the GPU configuration, AMD differentiates the graphics capabilities of each part only by altering the GPU clock.

We’ll have more details on the changes and new features introduced with Beema and Mullins on the next page, but in a nutshell they feature updated Puma+ x86 Cores, GCN-based graphics, new System-Aware Power Management, a new ARM-based Platform Security Processor, and support for DDR3 memory speeds up to 1866MHz.


AMD Beema / Mullins Die Map

The die shot and map above illustrate Beema/Mullins’ layout. As you can see, the Puma+ X86 cores and graphics engine encompass the bulk of the center of the chip, with cache and other functional blocks surrounding them. The small yellow section labeled PSP is the new platform security processor. The PSP is a 32-bit ARM Cortex A5 core with isolated ROM and SRAM. It’s designed to function in Trusted Execution Environment (TEE), with secure boot rooted in hardware, and support for cryptographic acceleration.


AMD Beema / Mullins Functional View

These latest AMD APUs aren’t strictly CPU and GPU cores crammed onto a single piece of silicon. They are full SoCs with on-die memory controllers, PCI Express, SATA, and USB connectivity, and a host of other interface elements. Connect one of these APUs to some memory and storage, and some I/O ports and you’ve essentially got a complete low-power, X86 compatible platform, with modern Radeon graphics.
 

Transparent
CPU, GPU, Security and IO Enhacements

Beema and Mullins are the first x86 processors to feature an integrated ARM core for security features. But that’s not all they’re about. Though the designs feature the same architectural components as last year’s Kabini, AMD has tweaked a few things to improve performance and power efficiency.


Jaguar Becomes Puma+

The CPU cores features in Beema and Mullins are architecturally similar to the Jaguar cores used in Kabini, but the new Puma+ cores as AMD is calling them can hit higher clock speeds in lower power envelopes and have 19% lower leakage current too. AMD says the leakage reduction comes by way of a mixture of the chip’s design and manufacturing process. The chips are not made on a new process node, but there were improvements to the tech as the 28nm process has matured.


GPU Speed and Power Improvements

The GPU cores have been enhanced in a similar manner. The GCN-based graphics engine used in Beema / Mullins run about 100MHz – 200MHz faster than the previous generation, but has been optimized for power efficiency and also offers a 38% reduction in leakage. Again, the leakage reduction is mostly a result of the chip’s design and maturity of its manufacturing process.


I/O Power Enhancements

AMD didn’t stop at the GPU and GPU cores, however. It has also optimized the memory and display interfaces for higher performance and lower power. The chips support a new low-power DDR-1333 mode, which reportedly offers a 500mW reduction compared to standard DDR3-1333. Conversely, the higher end configurations also support up to DDR3-1866 speeds, for higher bandwidth, and ultimately increased performance. The graphics engines in AMD’s APUs are often starved for memory bandwidth, so the higher max memory speed will increase performance in many workloads considerably.

The display interface has also been enhanced with voltage mode logic that also helps bring power down. With high resolution displays, AMD estimates a 200mW power reduction overall.


AMD Platform Security Processor

Perhaps the biggest change to arrive with Beema and Mullins is the new ARM-based Platform Security Processor, or PSP. The PSP uses the industry standard ARM TrustZone system security framework and is essentially a Cyptographic co-processor with RSA, SHA, ECC, and AES engines. The core also has hardware logic for secure boot, and though it has its own isolated ROM and SRAM, the PSP can assess system memory and other resources.


AMD PSP with ARM TrustZone

To leverage the PSP, software must be engineered to support it, but since it uses the existing ARM TrustZone system, support should come rather quickly. The PSP gives Beema and Mullins based ssytems and devices the ability to support a full Trusted Execution Environment (TEE), security-aware applications and secure services, and Trusted Applications (TA).


Skin Temperature Aware Power Management

In addition to improving the power characteristics of Beema and Mullins versus the previous generation, AMD has also enabled a new skin temperature away power management systems, dubbed STAPM, along with more intelligent boost controls. What STAPM does is allow the APU to boost to maximum speeds, for longer periods of time, without resulting in uncomfortable-to-hold temperatures on the skin (or outer shell) of the device. Without STAPM, AMD was leaving performance on the table, but with it, the APUs can churn through workloads faster, without hitting excessive temps, and with overall lower power. When in boost mode, the APU will obviously use more power during peak loads, but because these new APU can hit higher clocks, they can complete tasks faster, which in turn allows unused portions of the platform to be shut down faster as well.
 

Transparent
The AMD Discovery Platform

To test the Mullins APU, we spent a little time with and AMD-built prototype tablet, AMD’s Discovery Platform. The tablet was built around a 4.5W A10 Micro-6700T APU, and is passively cooled. Though the tablet was very well built for a prototype, we should point out that this particular device will not be sold at retail. It is simply a vehicle for testing AMD’s latest APUs.

The tablet featured a 10.1” full-HD 1080P screen, with all of the amenities you’d expect from a current-gen tablet. As we’ve already mentioned, it was powered by an A10 Micro-6700T quad-core APU with Radeon R6 graphics, which was paired to 2GB of DDR3-1333 memory and a 64GB SanDisk SSD. The device was running Windows 8.1 64-bit edition.




AMD's Discovery Tablet Platform

Though we didn’t configure the tablet ourselves, we did spend a considerable amount of time poking around the OS installation and made the same tweaks to it that we would when building up our own test beds. We also installed our own copies of many of the benchmarks and we left on our own to run whatever tests we wanted. Time was limited, however, so we didn’t get a chance to do much additional experimentation. Because this is only a reference platform, we also weren’t able to directly test power or battery life.

The block diagram above shows all of the IO and sensors attached to the Mullins SoC used in the tablet. As you can see, it had just about everything you could ask for. What it doesn’t show is how the device felt in practice. We can say with confidence that the tablet was fast and fluid and every bit as usable as any Bay Trail or low-voltage Intel Core-based mobile device. In addition, because the tablet had a relatively powerful graphics setup, it was also able to play some fairly taxing games. Dirt Showdown, for example, was perfectly playable with high image quality settings at 720P. That’s pretty impressive for a passively cooled device.

Though we couldn’t test power on our own, AMD provided some numbers to show how Beema compared to last-year’s Kabini. Across the board, regardless of workload, the Beema system uses les power, which directly translates to longer battery life.

  
  
AMD A10 Micro-6700T CPU-Z Details

If you’d like more specifics about the setup, here are an array of screenshots taken with CPU-Z. We’ve got the CPU, graphics, memory, and motherboard info for you here. Please note, that the three images across the top were captured while the system was idling, while it was running a single-threaded workload, and while running a multi-threaded workload. Assuming CPU-Z was reading the sensors correctly, the chip idled at 997MHz with a .24V core voltage. When running a single-threaded workload, the chip would peak at 2.195GHz at .925V, and with a multi-threaded workload the chip would run at 1.596GHz at .575V. Now, these clocks weren’t constant—they fluctuated quickly as the APU churned through a workload (AMD claims the APUs can switch between power states within single digit micro-second time intervals, and switch from full system idle to higher power states in the tens of microseconds, including voltage changes and bringing up the clocks), but they give you an idea of how the chips run in a passively cooled device.

Transparent
SiSoft SANDRA Benchmarks

We began our testing with SiSoftware's SANDRA 2014 the System ANalyzer, Diagnostic and Reporting Assistant. We ran four of the built-in subsystem tests that partially comprise the SANDRA 2014 suite with AMD's A10 Micro-6700T Discovery Reference Platform (CPU Arithmetic, Multimedia, Memory Bandwidth, and Cache Bandwidth).

SiSoft SANDRA 2014
System Level Benchmark

 

All of the scores reported below were taken with the processor running at its default clock speeds (max boost of 2.2GHz)  with 2GB of DDR3-1333 RAM and Windows 8.1 64-bit.


Processor Arithmetic
 

Processor Multimedia

Memory Bandwidth
 

Cache and Memory
 

The SANDRA numbers are not going to blow anyone away--this a low-power tablet platform after all--but the results are solid. Memory bandwidth was in the 5.2GB/s range, and the processor and cache related metrics showed the chip competing well with higher-power, Intel Pentium-class notebook processors.
 

Transparent
Futuremark PCMark 8 v2, PCMark 7

Futuremark's PCMark 7 is a whole-system benchmarking suite. It has application performance measurements targeted for a Windows 7 environment and uses custom metrics to gauge relative performance. Below is what Futuremark says is incorporated into the base PCMark suite and the Entertainment, Creativity, and Productivity suites--the four modules we have benchmark scores for you here.
Futuremark PCMark 7
General Application and Multimedia Performance

The PCMark test is a collection of workloads that measure system performance during typical desktop usage. This is the most important test since it returns the official PCMark score for the system
Storage
  • Windows Defender
  • Importing pictures
  • Gaming

Video Playback and transcoding
Graphics

  • DirectX 9

Image manipulation
Web browsing and decrypting

The Entertainment test is a collection of workloads that measure system performance in entertainment scenarios using mostly application workloads. Individual tests include recording, viewing, streaming and transcoding TV shows and movies, importing, organizing and browsing new music and several gaming related workloads. If the target system is not capable of running DirectX 10 workloads then those tests are skipped. At the end of the benchmark run the system is given an Entertainment test score.

The Creativity test contains a collection of workloads to measure the system performance in typical creativity scenarios. Individual tests include viewing, editing, transcoding and storing photos and videos. At the end of the benchmark run the system is given a Creativity test score.

The Productivity test is a collection of workloads that measure system performance in typical productivity scenarios. Individual workloads include loading web pages and using home office applications. At the end of the benchmark run the system is given a Productivity test score.


Versus Bay Trail and Kabini, the Mullin's based AMD A10 Micro-6700T competes well. It outpaced Intel's reference tablet only in the productivity test, but overall it's showing wasn't bad, especially considering its clock speed deficit.

PCMark 8 v2
System Level Benchmarks 

PCMark 8 v2 is the latest version in Futuremark’s series of popular PC benchmarking tools. It is designed to test the performance of all types of systems, from tablets to desktops. PCMark 8 offers five separate benchmark tests--plus battery life testing—to help consumers find the devices that offers the perfect combination of efficiency and performance for their particular use case. This latest version of the suite improve the Home, Creative and Work benchmarks with new tests using popular open source applications for image processing, video editing and spreadsheets. A wide variety of workloads have also been added to the Work benchmark to better reflect the way PCs are used in enterprise environments.

These tests can be run with or without OpenCL acceleration. We chose to run with OpenCL acceleration enabled to leverage all of the platforms’ CPU and GPU compute resources…

We do not have reference numbers on any of the other platforms we tested, and ran out of time during our hands on sessions with Mullins, but were able to nab a score for the "Work" test.  We literally snapped this picture of the results screen as we were running out of the doors at AMD HQ--apologies for the blurriness.
 

Transparent
SunSpider and BrowserMark

Next up, we have some numbers from the SunSpider JavaScript benchmark. According to the SunSpider website:

SunSpider JavaScript Benchmark
JavsScript Performance Testing

This benchmark tests the core JavaScript language only, not the DOM or other browser APIs. It is designed to compare different versions of the same browser, and different browsers to each other. Unlike many widely available JavaScript benchmarks, this test is:

Real World - This test mostly avoids microbenchmarks, and tries to focus on the kinds of actual problems developers solve with JavaScript today, and the problems they may want to tackle in the future as the language gets faster. This includes tests to generate a tagcloud from JSON input, a 3D raytracer, cryptography tests, code decompression, and many more examples. There are a few microbenchmarkish things, but they mostly represent real performance problems that developers have encountered.

Balanced - This test is balanced between different areas of the language and different types of code. It's not all math, all string processing, or all timing simple loops. In addition to having tests in many categories, the individual tests were balanced to take similar amounts of time on currently shipping versions of popular browsers.

Statistically Sound - One of the challenges of benchmarking is knowing how much noise you have in your measurements. This benchmark runs each test multiple times and determines an error range (technically, a 95% confidence interval). In addition, in comparison mode it tells you if you have enough data to determine if the difference is statistically significant.

All of the systems were tested using the latest version of Internet Explorer, with default browser settings, on a clean install of Windows 8.1.

The AMD A10 Micro-6700T based platform put up an excellent score here, besting every other platform save for the low-power Core i3.

Browsermark 2.0
Browser-Based Performance

The AMD A10 Micro-6700T based platform also put up a great score in Browsermark and managed to pull ahead of all of the other low-power platforms we tested.
 

Transparent
Cinebench and LAME MT

Cinebench R15 is a 3D rendering performance test based on Cinema 4D from Maxon. Cinema 4D is a 3D rendering and animation suite used by animation houses and producers like Sony Animation and many others. It's very demanding of processor resources and is an excellent gauge of computational throughput.

Cinebench R11.5
3D Rendering

This is a multi-threaded, multi-processor aware benchmark that renders a photorealistic 3D scene (from the viral "No Keyframes" animation by AixSponza). This scene makes use of various algorithms to stress all available processor cores. The rate at which each test system was able to render the entire scene is represented in the graph below.

 

Once again, AMD A10 Micro-6700T based platform put up some very nice numbers. In both the single and multi-threaded tests, the A10 Micro-6700T pulled ahead of Bay Trail and it outpaced Kabini (which was tested in a full-sized notebook) too.

We don't have reference numbers from the other platforms with the newer Cinebench R15 benchmark, but the A10 Micro-6700T put up scores of 45 (single-thread) and 121 (multi-thread) in that application.

LAME MT
Audio Encoding

In our custom LAME MT MP3 encoding test, we convert a large WAV file to the MP3 format, which is a popular scenario that many end users work with on a day-to-day basis to provide portability and storage of their digital audio content. LAME is an open-source MP3 audio encoder that is used widely in a multitude of third party applications.

For this test, we created our own 223MB WAV file (a hallucinogenically-induced Grateful Dead jam) and converted it to the MP3 format using the multi-thread capable LAME MT application, in both single and multi-thread modes. Processing times are recorded below, listed in seconds. Shorter times equate to better performance.

 

Audio encoding with LAME MT has historically been a weak spot for AMD processor in our test suite. The A10 Micro-6700T did very well, however, and trailed only the Core i3-based notebook.
 

Transparent
Video Streaming and Playback

We also played back a handful of on-line and local video files to get a feel for the A10 Micro-6700T's multimedia prowess, including some high-def YouTube and Hulu video and H.264 encoded MP4 files.


The Avengers Trailer (1080p) Streaming From YouTube


Local Playback of 1080P MP4 File

All of the local content played back perfectly with very low CPU utilization as you can see in the screen capture above. 1080p Flash videos streamed from the web, however, resulted in higher-than-expected CPU utilization which in turn resulted in some dropped frames. 720p playback was perfectly smooth, however.

We didn't have time to experiment with different video settings to see if there was a particular video enhancement that was causing the problem, but reps from AMD said these results weren't in-line with expectations. When researching the issue after the fact, resetting the graphics driver back to defaults reportedly resolved the issue, but we didn't have time to verify that for ourselves.  Based on our experience with Kabini and other Radeons, we suspect this won't be an issue in shipping products / drivers.
 

Transparent
GPU Testing: 3DMark Fire Strike

Next up, we have some graphics testing with 3DMark Ice Storm, in both Extreme and Unlimited modes. Ice Storm Extreme raises the rendering resolution from 720p to 1080p and uses higher quality textures and post-processing effects in the Graphics tests to create a more demanding workload for the latest smartphones and tablets. In Unlimited mode frames are rendered in 720p off-screen while the display is updated with thumbnails every 100 frames to show progress.

Futuremark 3DMark Fire Strike
Synthetic DirectX Gaming

The AMD A10 Micro-6700T's Radeon R6 graphics engine is clearly superior to all of the other low-power platforms we tested. In the direct Extreme-to-Extreme comparison with Bay Trail, the A10 Micro-6700T simply dominates.
 

Transparent
GPU Testing: Cinebench OpenGL, Cloud Gate

Cinebench R11.5’s GPU benchmark uses a 3D scene depicting a car chase, which measures the performance of a graphics card using OpenGL. The CPU and GPU have to process a large amount of geometry (nearly 1 million polygons) and textures, as well as a variety of effects, such as bump maps, transparency, lighting and more. Results are reported in frames per second.

Cinebench R15 OpenGL Test
3D Rendering


The high-power, Kabini-based notebook platform put up a higher score than the A10 Micro-6700T here, but versus Intel's offerings, AMD's new APU does very well.

3DMark Cloud Gate
More DirectX Testing

We snuck in a run of 3DMark Cloud Gate in our limited time with the AMD A10 Micro-6700T-based Discovery Platform and, once again, it had no trouble outpacing Bay Trail.
 

Transparent
Summary and Conclusion

With last year’s Temash and Kabini-based APUs, AMD wanted to deliver a low-power platform that featured Graphics Core Next (GCN) based graphics. With this year’s release, the company wanted to incorporate a platform security processor, and focus on optimizing the overall performance and performance-per-watt characteristics of the CPU and GPU cores, as well as the IO. Although we haven’t been able to independently verify the power savings, it appears AMD has delivered on all fronts. The A10 Micro-6700T-based reference platform we tested performed well and offered a good user experience.


AMD Claims More New Features Are Coming To Their Mobile Products

AMD is not done, however. Its goals for future low-power products include integrated voltage regulation, finer-grained and inter-frame power gating, further optimizations to its intelligent boost algorithms and more.

For the immediate future, Beema and Mullins are a clear step forward over AMD’s previous-gen, low-power APUs. Performance is up, power is down, and the platform offers new features and capabilities, like the newly integrated, ARM-based Platform Security Processor.

AMD expects designs featuring these new APUs to hit store shelves in a few months, just in time for the back to school shopping season. At this point, only Lenovo and Samsung have announced actual products, but AMD tells us everyone that virtually every partner that used Kabini and Temash will likely have Beema and/or Mullins-based products.

Historically, AMD has had a tough time getting their APUs into a large number of designs, but we’re hopeful more devices will leverage these chips versus the previous generation. Beema and Mullins offer across-the-board improvements over last year’s parts and their graphics capabilities are an obvious strong suit.
 



Content Property of HotHardware.com