Intel's Ultra-Portable Atom: Unveiled
Meet the Atom Processor
Meet Centrino Atom A Processor, Chipset, Wireless Radio, and Pocket-Sized Package
Intel’s new platform for MIDs is called Menlow. And whereas McCaslin was found in ultra low-voltage notebooks, Menlow is the processor and chipset combination you’ll find in navigation devices, Internet tablets, video players, and gaming handhelds. From here on out you won’t hear Intel calling the platform by its internal name, though. The official brand is Centrino Atom.
A complete Centrino Atom configuration consists of the Atom processor (more on that shortly), Poulsbo, the single-component chipset, a wireless device, a battery, and a small form factor enclosure.
Centrino Atom sits between two other Intel brands. At the entry-level, you’ll find netbooks and nettops powered by Atom. Those are small, simple, and affordable devices built for Internet-oriented usage models. At the high-end, Intel has its Centrino initiative driven by Core 2 Duo processors. Centrino Atom is shooting for the “best Internet experience in your pocket.”
Formerly known as Silverthorne, Intel’s Atom processor leverages the 45nm high-K process technology we’ve come to associate with Penryn-based desktop and workstation CPUs. Remember the Pentium 4 we mentioned earlier? Atom sports 47 million transistors—just 5 million more than the original Pentium 4. But whereas Willamette occupied more than 200 square millimeters, Atom fits in less than 25 square millimeters.
Intel gave us access to the principal architect of Atom, Belliappa Kuttanna, who explained that one of his goals was to drastically reduce power consumption in order to propel Intel into a new market segment. Thus, Atom isn’t derived from any existing microarchitecture. That’s a big differentiator from the A100-series, which did center on Intel’s mobile designs. A second objective was to infuse Atom with enough processing horsepower to drive modern operating systems like Vista. Thirdly, Atom needed to be scalable, giving Intel the flexibility to create an entire product family with different features and running at different speeds.
So, Kuttanna’s team had to start from scratch in building Atom. Right away, they adopted an in-order execution engine, meaning instructions are dispatched and executed in the order that they appear. With the exception of Intel’s Itanium processor, all of the company’s other designs employ out-of-order engines. While OOO execution generally improves performance, Kuttanna clarified that Atom’s in-order implementation yielded much better energy efficiency.
The Atom architects also started with a single-issue machine, but that didn’t meet the team’s performance requirements, so they eventually settled on a dual decode and issue machine with a number of optimizations aimed at simplifying the architecture. For instance, the 32KB L1 instruction cache features pre-decode extensions. Because the IA architecture has variable-length instructions, Atom uses an algorithm that’s able to tag instructions with an end-of-instruction marker after a pass through the decoder. The next time the instruction is fetched, you have an indication of where the instruction ends, yielding better performance through the decoder. Atom’s branch predictors are much simpler as well, since they’d otherwise eat up too much of the power budget. The take-away is that Intel made some sweeping changes to the way IA instructions are decoded in an effort to maximize decoder efficiency and reduce power consumption.
Atom is also the first processor since Intel’s Pentium 4 to feature simultaneous multi-threading (SMT) in the form of Hyper-Threading. By design, in-order execution engines spend more processor clocks to execute instructions. Normally that’d be a performance inhibitor. However, Intel saw opportunity there. When execution is stalled, say, waiting for memory, pipeline resources go unutilized. By adding SMT, performance goes up as a second thread keeps instructions flowing through the engine. According to Intel’s Kuttanna, the performance gains seen from Atom’s in-order architecture with SMT are higher than an out-of-order design. Of course, realizing the benefits of SMT on an Atom-based device requires threaded software. The good news is that many audio and video codecs are already coded to employ threaded architectures. Multi-tasking will also showcase Atom’s ability to juggle more than one thread.
Moving into the FP/SIMD execution clusters, Atom features two single-cycle SIMD ALUs, one of which is equipped with a shuffle unit. The other supports a full-width floating point adder for single precision FP adds. Why the emphases on SIMD performance in the execution cluster? In profiling the apps typical of an MID, it became clear to Intel’s team that it’d see significant gains from building a wide data path.
In addition to its 32KB instruction cache, Atom sports a 24KB writeback data cache. The memory execution cluster also boasts a dual-level TLB hierarchy - one buffer is smaller, allowing very low latency access, and the other is significantly larger. A 512KB L2 cache with ECC support is integrated as well, able to fetch 64-byte cache lines in two clocks (that’s 256 bits per access). Onboard hardware prefetchers either pull data from memory into the L2 cache or from the L2 cache into the data cache. The processing core consists of roughly 13 million transistors and the chip’s L2 cache takes up about 30 million transistors.
Despite its ground-up design, Atom maintains compatibility with Intel’s Core 2 Duo product lineup. The architecture supports Intel Virtualization Technology, Execute Disable Bit support, 64-bit extensions and SSE3. However, Intel’s Pankaj Kedia says that not every feature will be productized in every SKU.