|After years of work and a few false starts, Intel is finally ready to take the plunge into the smartphone market. At the CES keynote tonight, the CPU giant is officially launching Medfield, the 32nm smartphone SoC the company has built to take it into the next generation of smartphones (and a few tablets). The chip, now officially named the Atom Z2460, is ready for prime time.
We visited Intel HQ in December and were briefed on the next-generation phone and what Intel expects it to do. After Moorestown's disappointing performance in the space, the CPU giant is keen to put its best foot forward, and our time with the company reflected that. Intel isn't trying to position Medfield as an ARM-crusher, but as a solution that's more than capable of running with the current pack of hardware.
Unlike Moorestown, which debuted with a bang, an LG-designed phone, and went nowhere thereafter, Medfield has solid design wins behind it, by which we mean products that will definitely be coming to market. Motorola and Lenovo are both announcing products today. Lenovo has its K800, a device intended for the Chinese market and sold by China Unicom, while Motorola has announced a multi-part deal with Intel for smartphone and tablets.
Details are still a bit sketchy on Moto's hardware, but the company expects to ship phones by this summer, with a tablet following later. These announcements aren't likely to be isolated events, either; we're likely to hear about more products at Mobile World Congress next month.
So what's inside the new chip? Let's have a look.
|The SoC: Single-Core Atom + PowerVR|
|Intel's official slide on Medfield isn't big on details, but the company
has given us permission to tell you more about the chip than what's
The GPU portion of the Z4260 is based on PowerVR's SGX540 with a target core clock of ~400MHz. That puts the chip's GPU performance in line with Texas Instruments' OMAP4460, which uses the same GPU and is clocked at 384MHz. Medfield integrates support for three displays, including 1900x1080 output via HDMI.
The chip supports dual-channel LPDDR2 667-800MHz and encodes video at 30 fps in 720p.
Penwell is in the upper-right hand corner, with the two chips from Moorestown (Lincroft and Langley) for comparison
The CPU core at the heart of Medfield is named Saltwell; it's the first new iteration of the architecture since Intel debuted Bonnell, the 45nm variant, back in 2008. At a high level, most of the core's features are unchanged. Saltwell is a single-core chip with HyperThreading. Like Bonnell, it's an in-order core capable of decoding up to two instructions per clock cycle with 56K of L1 and 512K of L2 cache.
There were, however, a few low-level improvements. Previously, Atom used a 4K table for gshare branch prediction. Saltwell stores 8K worth of entries in single-threaded mode and 4K of data per thread when HyperThreading is in use. Increasing the number of entries lowers the number of mispredicts and can help prevent thread stalls.
Intel's other performance improvements to Saltwell include faster memory copy routines, "improved performance of certain microcode flows", and a reduction in instruction scheduling restrictions. All of these improvements are extremely low-level, but Atom's in-order nature makes them more important than they might be otherwise. Unlike a conventional desktop or laptop processor, Atom can't re-order code for optimum execution. Relaxing scheduling restrictions helps improve core utilization and performance per watt efficiency.
Saltwell and the Medfield SoC are designed to open this new range of products
One fact about Saltwell that caught us by surprise is the chip's operating frequency. The chip runs at the same 1.6GHz that's been the hallmark of Atom since it debuted in netbooks 3.5 years ago. Its ability to dynamically adjust its clock frequency relative to workload, however, has been significantly expanded (more on this in the Power Consumption section later on).
Can a Single-Core x86 Processor Keep Up?
Intel's decision to opt for a single-core x86 chip bucks the market's general trend towards multi-core phones, but it's a good strategy for multiple reasons. HyperThreading doesn't deliver the same performance improvement as a second core, but we've seen it improve Atom's performance by 30-50 percent in a wide range of non-smartphone tests. The benefit of being able to schedule twin threads for simultaneous execution are low level enough that Android should see similar benefits.
Atom's in-order architecture makes HT particularly useful when it comes to improving core utilization and efficiency
Intel's other ace card is Atom's inherent performance advantage relative to its ARM counterparts. Benchmarks between the two are admittedly hard to come by, but the test results that are available suggest that Atom's single-threaded performance is significantly better than that of its ARM-based counterparts.
The best way to understand Saltwell's relative performance is as a balance between clock speed, multi-threading capability, and x86's inherently higher efficiency as compared to ARM. Against 1-1.2GHz dual-cores, Medfield's higher clock speed and HT should keep it in the running. Later this year, the chip faces stiff competition with the next generation of hardware expected to emerge, but remember, Medfield's primary goal is to compete with ARM products, not blow the doors off.
|Power Gating, Power/Performance Scaling|
|Scaling Power and Performance
Intel has sunk an enormous amount of effort into optimizing and improving Medfield's power consumption and frequency scaling. In order to understand why this matters, it helps to examine the relationship between voltage, frequency, and power consumption.
The graph above is for demonstration purposes only. It illustrates the relationship between CPU performance and total power consumption. At the lower end of the graph, CPU clock speeds increase much more rapidly than the chip's total power consumption. As voltage climbs, however, the graph flattens. At the right-hand side of the graph, power consumption increases more quickly than performance.
Saltwell is designed to take advantage of this curve. Clock control is fine-grained; adjustments are available in 100MHz increments. The CPU is designed to transition from sleep to active mode very quickly; Saltwell's worst-case exit latency for C1/C1E (CPU powered off) is just 350ns. The chip can wake up from C6 (deepest sleep, the entire SoC is powered down) in 70 microseconds.
Saltwell Flying Solo:
The ARM industry has put a considerable emphasis on using multiple cores inside a single SoC of late. Texas Instruments' OMAP4 platform incorporates a pair of Cortex-M3 cores for low-power operation, while Nvidia's fifth companion core is designed to lower the SoC's power consumption in stand-by or when performing low-level tasks.
Left, Nvidia's description of why using a Companion Core makes sense. Right, ARM's big.LITTLE concept.
The ability to transition quickly from one state to another is critical to Intel's strategy for mobile devices. Fine-grained power control allows Santa Clara to avoid the need for combining multiple CPUs (a strategy ARM refers to as big.LITTLE) or building specialized "companion" cores a la Nvidia. Intel has eschewed this policy in favor of fine-grained power control and extensive use of hibernation / sleep states.
This chart is from the Moorestown launch, but Saltwell's only difference is how it handles the L1 cache in C1/C2. Moorestown flushed the L1 when it dropped into these states--Saltwell / Medfield doesn't.
The following chart illustrates how fine-grained control can reduce total power consumption per task as well as improving device performance.
In this example, both CPUs begin in standby mode, drawing no power. CPU 1, when activated, establishes a constant frequency and begins data crunching. This essentially mirrors Nvidia's vSMP technology, where all cores run at the same clock speed. CPU 2, in contrast, starts off at 0.3W for an initial burst of calculations, ramps up to full power (0.8W) for a short period of time, falls back to 0.3W to close out its task, and then returns to standby.
Total power consumed over 15 seconds in this example is 4W for CPU 1 and 3.6W for CPU 2. In absolute terms, CPU 2 drew 10% less power.
Be aware, however, that this data is easily perverted--and the incentives for companies to do so are high enough that we're including the following as an early warning against future snake oil. If we examine average power consumption over the total time it takes for each CPU to calculate the workload, the tables turn. CPU #1 suddenly looks more efficient, with an average power consumption of 0.5W compared to 0.514W for CPU #2.
Because an arithmetic average is calculated in terms of power consumed per second, CPU #2 is mathematically penalized for completing the workload more quickly. The best way to avoid being tricked by bad graphs is to only trust data on average power consumption when the figures have been calculated against the same amount of time.
|Performance (So Far)|
|We had no opportunity to independently benchmark Medfield; the results presented here are the performance figures we observed when tests were run on each of the phones in question. All of the phones ran Gingerbread 2.3.
The Z2460's performance in Rightware's BrowserMark is excellent, easily outstripping the other two ARM-based smartphones. Intel acknowledges that performance would improve in Ice Cream Sandwich but maintains that all the devices would improve proportionally.
In stock configurations, the Sensation and Bionic offer only middle-of-the-road performance in this test, so a great deal comes down to how Atom performs in Ice Cream Sandwich. It should still be in the running but may lead by the same degree as we saw in December.
Since these GLBenchmark results were run on devices with different screen resolutions, they should be taken with a grain of salt--but then, GPU performance isn't Medfield's major push in any case. The point here is that the Z2460's performance is in the same league as its competition.
Video output was a different story. One comparison Intel put together was a 1080p 50 fps video playing at a 20mbps bitrate--or rather, trying to play. The Z2460 was the only device that could actually manage a bitrate that high; both the HTC Sensation and Droid Bionic died when asked to show the clip.
The HDMI playback off the Z2460 was excellent. This is almost certainly a fringe case--phones typically don't offer enough storage space to make carting around a bunch of 1080p video worthwhile--but today's fringe use is tomorrow's mainstream. The idea of stutter-free, high-profile 720p playback would've been ridiculous 18 months ago. Given the disparity between the pace of smartphone introduction and the length of your typical carrier contract, a little future-proofing isn't a bad thing.
|Even if the early performance figures we've seen hold up under scrutiny, Medfield will face a significant challenge from Qualcomm's 28nm Krait SoC, Nvidia's Tegra 3, and the Cortex-A15 chips due out in the back half of the year. Intel's goal, however, isn't to shatter performance records at this point. Everything we've seen to date suggests that Medfield will be able to compete effectively with the other phones we expect to see launch in 2012.
Longer term, Intel's Atom roadmap looms ominously over the ARM industry's own plans for world domination. In the low power world, power consumption and performance are increasingly dependent on process technology rather than deriving primarily from a CPU's architecture. Here, Intel has a profound advantage.
The far blue-purple block denotes the beginning of 20nm risk production
By 2013, Qualcomm, TI, Samsung, and NV will have collectively moved to 28nm, at which point Intel will be deploying 22nm. TSMC's roadmap, shown above, shows the company beginning 20nm deployments in the second quarter of 2013. Such estimates must be taken with a grain of salt; TSMC began 28nm risk production in 2010 and started shipping parts for revenue in Q3 2011. Qualcomm is expected to ship 28nm chips for revenue beginning this quarter.
In short, there's reason to think Intel's roadmaps are a heck of a lot more accurate when it comes to shipping parts on new processes, particularly when Atom will debut on 22nm after it's been in use for Ivy Bridge for nearly a year.
We think Medfield could change the way people think about x86, but even if it doesn't catch fire, it's the first real salvo in a war we expect to heat up very, very quickly. ARM may have collectively snickered after Moorestown, but no one's likely to be laughing anymore.