x86 Everywhere: Intel Announces Medfield Phones

by Joel Hruska — Tuesday, January 10, 2012, 08:01 PM EDT

Page 3:
Power Gating, Power/Performance Scaling

Scaling Power and Performance

Intel has sunk an enormous amount of effort into optimizing and improving Medfield's power consumption and frequency scaling. In order to understand why this matters, it helps to examine the relationship between voltage, frequency, and power consumption.

The graph above is for demonstration purposes only. It illustrates the relationship between CPU performance and total power consumption. At the lower end of the graph, CPU clock speeds increase much more rapidly than the chip's total power consumption. As voltage climbs, however, the graph flattens. At the right-hand side of the graph, power consumption increases more quickly than performance.

Saltwell is designed to take advantage of this curve. Clock control is fine-grained; adjustments are available in 100MHz increments. The CPU is designed to transition from sleep to active mode very quickly; Saltwell's worst-case exit latency for C1/C1E (CPU powered off) is just 350ns. The chip can wake up from C6 (deepest sleep, the entire SoC is powered down) in 70 microseconds.

Saltwell Flying Solo:

The ARM industry has put a considerable emphasis on using multiple cores inside a single SoC of late. Texas Instruments' OMAP4 platform incorporates a pair of Cortex-M3 cores for low-power operation, while Nvidia's fifth companion core is designed to lower the SoC's power consumption in stand-by or when performing low-level tasks.

Left, Nvidia's description of why using a Companion Core makes sense. Right, ARM's big.LITTLE concept.

The ability to transition quickly from one state to another is critical to Intel's strategy for mobile devices. Fine-grained power control allows Santa Clara to avoid the need for combining multiple CPUs (a strategy ARM refers to as big.LITTLE) or building specialized "companion" cores a la Nvidia. Intel has eschewed this policy in favor of fine-grained power control and extensive use of hibernation / sleep states.

This chart is from the Moorestown launch, but Saltwell's only difference is how it handles the L1 cache in C1/C2. Moorestown flushed the L1 when it dropped into these states--Saltwell / Medfield doesn't.

The following chart illustrates how fine-grained control can reduce total power consumption per task as well as improving device performance.

In this example, both CPUs begin in standby mode, drawing no power. CPU 1, when activated, establishes a constant frequency and begins data crunching. This essentially mirrors Nvidia's vSMP technology, where all cores run at the same clock speed. CPU 2, in contrast, starts off at 0.3W for an initial burst of calculations, ramps up to full power (0.8W) for a short period of time, falls back to 0.3W to close out its task, and then returns to standby.

Total power consumed over 15 seconds in this example is 4W for CPU 1 and 3.6W for CPU 2. In absolute terms, CPU 2 drew 10% less power.

Be aware, however, that this data is easily perverted--and the incentives for companies to do so are high enough that we're including the following as an early warning against future snake oil. If we examine average power consumption over the total time it takes for each CPU to calculate the workload, the tables turn. CPU #1 suddenly looks more efficient, with an average power consumption of 0.5W compared to 0.514W for CPU #2.

Because an arithmetic average is calculated in terms of power consumed per second, CPU #2 is mathematically penalized for completing the workload more quickly. The best way to avoid being tricked by bad graphs is to only trust data on average power consumption when the figures have been calculated against the same amount of time.