Logo   Banner   TopRight
TopUnder
Transparent
AMD 2013 A & E-Series Kabini and Temash APUs
Transparent
Date: May 23, 2013
Section:Processors
Author: Marco Chiappetta
Transparent
Introduction, Specs, and Features

AMD has been pretty open about discussing certain products in the roadmap. In fact, we’ve disclosed a number of details regarding the main products we’ll be talking about in this article--Kabini, Temash, and Richland--over the last few months.

It was all the way back at CES that we first showed you Kabini, Temash, and Richland-based products in action in a number of prototype notebooks and tablets from Vizio, HP, Asus and others. And AMD actually talked about the foundation of two of these products (Kabini and Temash)—its Jaguar CPU core microarchitecture—at Hot Chips in April of last year. If you’re unfamiliar with Kabini and Temash, they are the codenames given to AMD’s next-gen, low-power APUs targeted at mobile and ultra-mobile form factors. Kabini and Temash are not simple updates to existing products, however. As we’ve mentioned, they feature newly-designed CPU cores fused to a Graphics Core Next-based GPU, and they're designed to considerably improve performance while also operating at lower power. Richland is based on last year’s Trinity microarchitecture, but it's updated with a number of power- and performance-related enhancements.


AMD Kabini SoC Die Shot

Above is a shot of a quad-core Kabini die. Kabini is the follow-up product to AMD’s very successful Brazos line of products. How successful was Brazos, you ask? According to the most recent information provided by AMD, the company sold upwards of 48 million units, and if you ask them, they’re expecting greater success from Kabini.  


AMD Is Targeting Mobile Form Factors With Kabini, Temash, and Richland ULV

We’ll dive a little deeper on the pages ahead, but to give you some high-level guidance, Kabini is an x86 quad-core SoC (system on a chip) targeted at entry-level and small form factor touch notebooks. Officially, AMD will be referring to Kabini as their “2013 AMD Mainstream APU”, and it is one of these products that we’ve been able to test drive for the last couple of weeks.


A Kabini Based AMD A6-Series APU

Also arriving alongside Kabini is Temash. Temash and Kabini are based on the same microarchitecture and share essentially the same feature set, but Temash targets small form factor notebooks, tablets, and hybrids 13-inches and smaller. AMD puts Temash-based products under the “2013 AMD Elite Mobility APU” umbrella, and the SoCs will come in dual (A4) and quad-core (A6) configurations.

Finally, AMD is revealing yet another group of products, the 2013 AMD Elite Performance APUs (formerly codenamed “Richland”), which consist of higher-performing A8 and A10 branded products targeting premium ultrathin notebooks.
 

Transparent
Temasha and Kabini Architectures

Kabini and Temash were designed from the ground up to improve overall performance and power efficiency over previous-generation products. The APUs will be offered in dual- and quad-core varieties and are manufactured using TSMC’s 28nm process node. Unfortunately, as of this writing, AMD hasn’t disclosed exact transistor counts and die sizes, but we should have that information soon.

AMD’s previous-gen Brazos-based products featured the company’s Bobcat CPU core design. The Jaguar cores in Temash and Kabini improve on Bobcat with better IPC performance, the ability to run at higher frequencies at a given voltage, and improved power efficiency though finer-grained clock and power gating and unit redesigns. AMD wanted to preserve the throughput of Bobcat and save on power but ultimately ended up with a much higher-performing part (relatively speaking), as well.

Jaguar adds support for SSE4.1, SSE4.2, AES, CLMUL, MOVBE, AVX, XSAVE/XSAVEOPT, F16C, BMI1, and has a 40-bit physical address space. The cores feature improved instruction cache prefetching, too. AMD tells us they grew the instruction buffer and added about 30% additional die area over what they had with Bobcat. They’ve also added a divider to the integer unit (a minor modification from the unit in Llano) and added a pipeline stage (a decode stage), which allowed them to boost frequencies. A pipeline stage was added to the FPU as well, again for better frequency response at lower voltages.

The new Jaguar cores in Kabini and Temash are also outfitted with enhanced Out-of-Order resources, including a redesigned scheduler, and buffers that are 30% - 70% larger than Bobcat. The FPU was totally redesigned, and increases from 64-bits to 128-bits wide. And the Load Store Unit and Data Cache have redesigned queues, with a matrix-style picker and store data FIFO.

With Jaguar, the 16-way set associative L2 is also a shared cache among all the cores. With Bobcat, 512K was allocated to each core. With Jaguar though, all 2MB is shared, and cache dynamically reallocates to threads that need it. The L2 redesign is where a large part of the IPC improvements over Bobcat come from. AMD claims up to a 22% IPC improvement in single threaded workloads, clock for clock, or 15% if you restrict a Jaguar core to the same size cache as Bobcat. The L2 cache redesign helps IPC because of the shared resources. If only one to two cores are lit up, they have access to much more cache than Bobcat.

Moving on from the cores, Kabini and Temash feature an integrated 64-bit wide memory controller / Northbridge, with official support for frequencies up to DDR3-1600. The memory controller also supports 1.25v, 1.35, and 1.5v DIMMs. Also present is a Fusion Controller Link, or FCL, which is how the IO subsystems interface with the on-die Northbridge and allows the CPU to access the GPU frame buffer (and vice versa). The FCL is 128-bits in each direction, while the graphics memory bus is 256-bit in each direction.

As we’ve mentioned, Kabini and Temash feature Radeon HD 8000-series DX11.1 graphics cores based on AMD’s Graphics Core Next architecture. There are 128 Radeon Cores on board, which offer up to a 75% performance improvement over the previous-gen graphics core used in Hondo.

AMD also offers a new technology they're calling "Turbo Dock", which can boost performance by up to 40% by enhancing cooling performance and supplying more power when a tablet or convertible device is docked, but we haven't seen it in action just yet.

Transparent
2013 Mobile APU Product Positioning

As we’ve mentioned, AMD’s got a few series of products launching today, that, while similar, target different form factors and market segments. There are new A-Series APUs, E-Series APUs based on Kabini, and A-Series Elite Mobility APUs based on Temash, all on the way.

We had a chance to test out an A4-5000 15W Kabini APU, and have some numbers to share with you on the proceeding pages. Before we get to the numbers, though, we should point out that the A4-5000 will be accompanied by a higher clocked A6-5200 and an array of E1 and E2 series parts as well. The respective core counts, frequencies, GPU configurations, and TDP’s, among some other details, are outlined in the chart above.

These new A-Series and E-Series parts are going to target mainstream and entry-level price points and go head to head with Intel’s more affordable Pentium and Celeron-class products in notebooks and small form factor systems. The somewhat higher-end A6-Series parts are poised to do battle with Core i3-class products in similar form factors. As you’ll see in the pages ahead, these targets seem realistic, at least as far as the performance of the A4-5000 is concerned.

AMD’s A-Series Elite Mobility APUs, which are built around the Temash SoC, have much lower thermal / power envelopes than their more mainstream counterparts. The high-end A6-1450 is a quad-core part with 1.0GHz/1.4GHz base/boost clocks and a TDP of only 8W. Coming in at the low-end is the A4-1200, a dual-core SoC clocked at 1GHz with a TDP just under the 4W mark.

You'll notice that all of these Kabini- and Temash- based products feature the same number of Radeon Cores—128 to be exact. GPU performance between the parts is differentiated by frequency. The lowest-clocked part has a GPU frequency of 225MHz, while the highest-clocked part has a 600MHz GPU clock. As we mentioned earlier, the GPUs fused with Temash and Gabini are based on AMD’s GCN (Graphics Core Next) architecture and offer all of the features of higher-powered Radeon HD-branded discrete GPUs.

And here we have the list of Richland ULV-based A-Series APUs due to arrive soon. These parts fall under the 2013 AMD Elite Performance A-Series APU umbrella and target higher performance (and higher power) thin and light mobile platforms.

Transparent
The Test Platform

For the purposes of this article, AMD supplied us with a Kabini-powered whitebook, running Windows 8 Enterprise 64-bit. The machine had no particular markings, and won’t be made available at retail, but we thought you’d like to see what we used to test AMD’s new mobility platform anyway.

The machine you see pictured here is powered by an AMD A4-5000 quad-core APU, paired to 4GB of DDR3-1600 system memory set up in a dual-channel configuration. The system features a 14” screen with a full HD resolution of 1920x1080, a multi-touch capable Touchpad, a built in SD card reader and optical drive, and a 1TB 5400RPM hard drive. All of the other typical accoutrements are present as well, such as built-in Wi-Fi, Bluetooth, Ethernet, audio, and so on.

Aesthetically, there’s not much to talk about, since the machine bore no identifying markings whatsoever, other than a few stickers with serial numbers and AMD owner information. The whole machine was covered in a brushed-metal type finish, but only the backside of the lid was actually metal. The rest of the machine is made of a plastic composite material that was lightweight, but also fairly flexible.

The keyboard sports standard chiclet-type keys and the mid-sized touchpad—which featured independent right and left buttons—is centered right under the space bar. The machine’s power switch is at the top left of the keyboard and at the bottom left, just above the card reader, are four indicator LEDs for power, drive activity, Wi-Fi status, and Cap Lock status.

The left side of the machine is home to the power jack; a large air vent; and LAN, VGA, DisplayPort, and USB 3.0 ports. The right side has headphone and microphone jacks, two USB 2.0 ports, an optical drive, and a lock port. The back side of the machine (which is where the removal battery plugs in) is devoid of any markings or ports.

Although this particular machine won’t be made available at retail, we want to give some general impressions after using it for the last week or so. Generally speaking, we found the machine to be responsive and particularly adept at video-related tasks. The slow hard drive in the system meant it didn’t have that snappy SSD-like responsiveness, but for CPU- and GPU-bound workloads, the machine felt surprisingly smooth.

Transparent
CPU and System Level Performance

We didn't have many comparable notebooks on hand to match head-to-head with the Kabini-based test platform featured on the previous page, but we've pulled together some numbers from an Intel CloverTrail-based tablet and similarly-clocked Core i3-2377M to at least begin to paint of picture of where the AMD A4-5000 performance lands in comparison to some competing platforms. All of the machines were running Windows 8. Please note, the Core i3-2377M features integrated Intel HD 3000 series graphics. First up, some system level metrics with PCMark 7...

Up until a couple days ago, Futuremark's PCMark 7 was the latest version of the PCMark whole-system benchmarking suite (PCMark 8 was just released). It has updated application performance measurements targeted for Windows 7 (or newer) environments and uses newer metrics to gauge relative performance. Below is what Futuremark says is incorporated into the base PCMark suite and the Entertainment, Creativity, and Productivity suites--the four modules we have benchmark scores for you here.

Futuremark PCMark 7
General Application and Multimedia Performance

The PCMark test is a collection of workloads that measure system performance during typical desktop usage. This is the most important test since it returns the official PCMark score for the system
Storage
  • Windows Defender
  • Importing pictures
  • Gaming

Video Playback and transcoding
Graphics

  • DirectX 9

Image manipulation
Web browsing and decrypting

The Entertainment test is a collection of workloads that measure system performance in entertainment scenarios using mostly application workloads. Individual tests include recording, viewing, streaming and transcoding TV shows and movies, importing, organizing and browsing new music and several gaming related workloads. If the target system is not capable of running DirectX 10 workloads then those tests are skipped. At the end of the benchmark run the system is given an Entertainment test score.

The Creativity test contains a collection of workloads to measure the system performance in typical creativity scenarios. Individual tests include viewing, editing, transcoding and storing photos and videos. At the end of the benchmark run the system is given a Creativity test score.

The Productivity test is a collection of workloads that measure system performance in typical productivity scenarios. Individual workloads include loading web pages and using home office applications. At the end of the benchmark run the system is given a Productivity test score.

We left the Core i3 out of the mix here because the machine had 8GB of memory and a fast SSD--needless to say, it would have taken a clear lead here.  With that said, the A4-5000 had no trouble dispatching the Atom Z2760, despite the Atom processor's significantly higher frequency.

LAME MT & Cinebench
Audio Encoding & 3D Rendering

In our custom LAME MT MP3 encoding test, we convert a large WAV file to the MP3 format, which is a popular scenario that many end users work with on a day-to-day basis to provide portability and storage of their digital audio content. LAME is an open-source MP3 audio encoder that is used widely in a multitude of third party applications.

In this test, we created our own 223MB WAV file (a hallucinogenically-induced Grateful Dead jam) and converted it to the MP3 format using the multi-thread capable LAME MT application, in both single and multi-thread modes. Processing times are recorded below, listed in seconds. Shorter times equate to better performance.

The AMD A4-5000 finished well ahead of Atom in our audio encoding test as well, but the Core i3 had a clear advantage. Though the Jaguar cores used in Kabini offer higher IPC performance than AMD's previous-gen Bobcat low-power cores, clock-for-clock they can't keep pace with Intel's Core processors in terms of single-threaded performance.

Next up is Cinebench. Cinebench R11.5 is a 3D rendering performance test based on Cinema 4D from Maxon. Cinema 4D is a 3D rendering and animation suite used by animation houses and producers like Sony Animation and many others. It's very demanding of processor resources and is an excellent gauge of pure computational throughput.  This is a multi-threaded, multi-processor aware benchmark that renders a photorealistic 3D scene (from the viral "No Keyframes" animation by AixSponza). This scene makes use of various algorithms to stress all available processor cores. The rate at which each test system was able to render the entire scene is represented in the graph below.

The quad-core A4-5000 is able to overtake the dual-core Core i3 here in the multi-threaded benchmark. And makes mincemeat of the Atom processor, but in terms of single-threaded performance, Intel still rules the roost.

Transparent
Web Browsing and Javascript Performance
Next up, we have some numbers from the SunSpider JavaScript and Browsermark web-browsing benchmarks.

SunSpider JavaScript Benchmark
JavsScript Performance Testing

According to the SunSpider website:

This benchmark tests the core JavaScript language only, not the DOM or other browser APIs. It is designed to compare different versions of the same browser, and different browsers to each other. Unlike many widely available JavaScript benchmarks, this test is:

Real World - This test mostly avoids microbenchmarks, and tries to focus on the kinds of actual problems developers solve with JavaScript today, and the problems they may want to tackle in the future as the language gets faster. This includes tests to generate a tagcloud from JSON input, a 3D raytracer, cryptography tests, code decompression, and many more examples. There are a few microbenchmarkish things, but they mostly represent real performance problems that developers have encountered.

Balanced - This test is balanced between different areas of the language and different types of code. It's not all math, all string processing, or all timing simple loops. In addition to having tests in many categories, the individual tests were balanced to take similar amounts of time on currently shipping versions of popular browsers.

Statistically Sound - One of the challenges of benchmarking is knowing how much noise you have in your measurements. This benchmark runs each test multiple times and determines an error range (technically, a 95% confidence interval). In addition, in comparison mode it tells you if you have enough data to determine if the difference is statistically significant.

All of the systems were tested using the latest version of Internet Explorer 10, with default browser settings, on a clean install of Windows 8.

The A4-5000 once again handily outruns the Atom-based Z2760 in the Sunspider benchmark, but the similarly-clocked Core i3-2377M puts up the best score of the group by far.

We see the same performance trend in the BrowserMark test (please note, all systems were tested with IE 10 in desktop mode). The A4-5000 is much faster than the Atom here, but the Core i3 once again takes the top spot.
 

Transparent
GPU Performance and Power

Next up we have some GPU benchmarks using Cinebench’s OpenGL test and 3DMark 11. The GPUs at the heart of the A4-5000 and low-end Intel platforms we’ve tested here aren’t particularly powerful, so don’t expect playable framerates in any cutting edge games. But gauging relative performance can at least speak to one platform's graphics superiority over another.

GPU Tests: Cinebench OpenGL and 3DMark
Open GL and DirectX Tests

You’ll notice a few red bars in the graphs above. Those indicate tests that wouldn’t run on the Intel platforms. In the tests that did run, the A4-5000’s integrated Radeon HD 8330 GPU was roughly twice as fast as Intel’s HD 3000 engine in Cinebench’s OpenGL test. The A4, however, was the only one capable of running 3DMark 11. The A4-5000’s score of E967 (this was the entry level benchmark) isn’t particularly high, but at least the platform is capable of running DX11-class content.

We also spent some time playing back various video files and found the A4-5000 to perform very well. Various SD and HD videos streamed from a NAS over the network all played back beautifully, and web-based HD content streamed without incident as well.

What you see pictured here is the official Iron Man 3 trailer in 1080p, streaming from YouTube, running in full screen mode on the A4-5000-based white book we used for testing. As you can see in the overlay, CPU utilization occasionally peaked at about 40%, but typically hovered in the 5% - 25% range.

Total System Power Consumption
Tested at the Outlet

Since the notebook we used for testing won't be make available at retail, we didn't do any formal battery tests, but we did monitor power on the machine to see how much juice the platform used under various workloads. Our goal was to give you all an idea as to how much power the system used while idling and while under heavy CPU and GPU workloads. Please keep in mind that we were testing total system power consumption at the outlet here, not just the power being drawn by the processor alone--all of the notebook's components are factored into the numbers here.

Under the absolute worst case scenario, with the notebook’s screen powered on (at 50% brightness) and full CPU and GPU workloads, the machine consumed only 23 watts total. As we stepped down the workload and focused only on the CPU or GPU, power consumption decreased.

What these numbers mean for real-world battery life will vary from machine to machine, but we can say that the A4-5000-based whitebook we tested had no trouble lasting about 4.5 - 5 hours with moderate to heavy use with its 15v / 3000mAh / 45Wh battery. Under lighter use, a full 8 hour work day would not be a problem at all.

Transparent
Our Summary and Conclusion

The PC market is changing rapidly, as tablets and other ultra-mobile / convertible form factors continue to eat away at traditional desktop and notebook PC sales. AMD hopes that its new mobile A-Series and E-Series APUs put the company in a better position to capitalize on the myriad opportunities offered by the burgeoning ultramobile market. And in all likelihood, they have.

We all know Intel’s Haswell is due to launch soon and that it will likely offer improved performance and power efficiency over current Ivy Bridge-based products. Since Ivy Bridge and Sandy Bridge Core processors already offer huge performance advantages over anything AMD currently has available, Haswell will most likely continue that trend in high-performance notebooks and desktop systems. But AMD’s new products don’t necessarily target Haswell. Kabini and Temash are going after the space currently occupied by CloverTrail (i.e. Atom) and entry-level Pentium/Celeron branded products; versus these parts, AMD is in a decent position.


There's That Kabini Die Shot Again...

As was the case with the previous generation Brazos, AMD’s low-power processor cores offer competitive, or much better, performance in the entry-level mobile space. AMD’s graphics performance, however, is simply on another level. Intel has BayTrail coming down the pipeline, but it won’t hit the market for a few more months at least. AMD is ready now with a new, more power-efficient architecture than its previous gen, which improves performance across the board. If Brazos was an unmitigated success for AMD, the prospects for its 2013 mobility platforms are good considering they offer much better CPU and GPU performance, at lower power.

Of course, AMD’s success ultimately is in the hands of its OEM partners. If AMD can land some lucrative design wins, we’re sure their 2013 mobility platforms will do well. We’re aware of a few interesting products built around AMD’s latest A- and E-Series APUs due to arrive soon, but we haven’t gotten our hands on retail-ready product just yet. Hopefully we will soon, because the idea of a low-power, relatively high-performing convertible with DX11-class graphics and all-day battery life is intriguing to say the least.



Content Property of HotHardware.com