|Do you remember the original Pentium 4? It launched at 1.5 GHz and gave us our first bittersweet taste of the NetBurst microarchitecture, which Intel would use to replace the P6 design.
When the Pentium 4 began its life, Intel manufactured the chips on a 180 nm node. The 42 million transistors that went into those first Pentium 4s - internally referred to as Willamettes - occupied a die no less than 217 square millimeters. Keep those figures in mind throughout our overview of Intel’s newest mobile processor and platform. And don’t feel too old; eight years seems like a lifetime, when you’re talking tech.
Intel is formally announcing a brand new processor today that it hopes will drive the next generation of mobile Internet devices. Perhaps you’re already familiar with the MID concept. Last year, Intel took the wraps off of its McCaslin platform, a seldom-discussed proof-of-concept that never really took off. Nor was it meant to. McCaslin employed Intel’s A100/A110 processor built on 90nm process technology. Those CPUs were derived from Intel’s Pentium M efforts. And while they enabled respectable compute muscle at 3W, imagine running Windows Vista on an 800 MHz desktop. Or don’t. It’s a painful thought. Nevertheless, the A100 and A110 are x86 Intel chips that go into real products, like Samsung’s Q1.
The MID market is now being broken up into several different categories, including portable navigation, Internet tablets, video players, and handheld gaming. Note the absence of voice communications. Intel has its eye on smartphones, but the current hardware foundation isn’t there yet. We’ll have to wait until 2009/2010 to see what the Apples and Googles of the world do with Intel’s hardware vision. For now, it’s all about adding Internet connectivity to the digital devices you might already tote around with you.
“Big deal,” you say. “The Q1 you just mentioned has Wi-Fi access and works with Samsung’s HSPDA modem. My PSP does Wi-Fi. And I don’t mind loading my Archos video player up with hours of content before I hit the road.” All true. However, you’re still faced with several obstacles. The 800 MHz A110 in that Q1 isn’t very beefy. Although it centers on Intel’s mobile technology, a reported three hours of battery life isn’t exactly stunning. And between all three of the examples posited, you have the issue of compatibility to address.
Here’s where Intel’s story gets a bit more compelling. You probably didn’t know this, but Adobe has 160 versions of Flash 7.2 it uses to support the many combinations of mobile devices with different ARM processors and versions of the software stack. The company has to keep creating new versions at the cost of both time and money. Of course, Adobe charges its customers for the development efforts. For Intel’s MIDs, however, Adobe can port its software one time and any derivative hardware platform will use the code. The same goes for audio and video codecs, which can already be a pain when dealing with today’s fragmented portable entertainment device business.
The idea here, according to Pankaj Kedia, director of Intel’s global ecosystem programs, is to make the Internet available wherever you are, rather than have you going to it. Put it in the context of cell phones. Instead of being tied to land lines, cell phones provide the freedom of voice communications wherever you happen to be. Intel’s Kedia sees the same thing happening with the Internet. Rather than searching for somewhere to hook up, MIDs will put the Internet in your pocket with all of the compatibility and performance of a PC. It’s a noble vision for sure, and we haven’t yet heard how all of these devices will achieve ubiquitous connectivity. However, one thing is for sure: the hardware is here and it makes the old McCaslin platform look like child’s play.
|Meet the Atom Processor|
Intel’s new platform for MIDs is called Menlow. And whereas McCaslin was found in ultra low-voltage notebooks, Menlow is the processor and chipset combination you’ll find in navigation devices, Internet tablets, video players, and gaming handhelds. From here on out you won’t hear Intel calling the platform by its internal name, though. The official brand is Centrino Atom.
A complete Centrino Atom configuration consists of the Atom processor (more on that shortly), Poulsbo, the single-component chipset, a wireless device, a battery, and a small form factor enclosure.
Centrino Atom sits between two other Intel brands. At the entry-level, you’ll find netbooks and nettops powered by Atom. Those are small, simple, and affordable devices built for Internet-oriented usage models. At the high-end, Intel has its Centrino initiative driven by Core 2 Duo processors. Centrino Atom is shooting for the “best Internet experience in your pocket.”
Formerly known as Silverthorne, Intel’s Atom processor leverages the 45nm high-K process technology we’ve come to associate with Penryn-based desktop and workstation CPUs. Remember the Pentium 4 we mentioned earlier? Atom sports 47 million transistors—just 5 million more than the original Pentium 4. But whereas Willamette occupied more than 200 square millimeters, Atom fits in less than 25 square millimeters.
Intel gave us access to the principal architect of Atom, Belliappa Kuttanna, who explained that one of his goals was to drastically reduce power consumption in order to propel Intel into a new market segment. Thus, Atom isn’t derived from any existing microarchitecture. That’s a big differentiator from the A100-series, which did center on Intel’s mobile designs. A second objective was to infuse Atom with enough processing horsepower to drive modern operating systems like Vista. Thirdly, Atom needed to be scalable, giving Intel the flexibility to create an entire product family with different features and running at different speeds.
So, Kuttanna’s team had to start from scratch in building Atom. Right away, they adopted an in-order execution engine, meaning instructions are dispatched and executed in the order that they appear. With the exception of Intel’s Itanium processor, all of the company’s other designs employ out-of-order engines. While OOO execution generally improves performance, Kuttanna clarified that Atom’s in-order implementation yielded much better energy efficiency.
The Atom architects also started with a single-issue machine, but that didn’t meet the team’s performance requirements, so they eventually settled on a dual decode and issue machine with a number of optimizations aimed at simplifying the architecture. For instance, the 32KB L1 instruction cache features pre-decode extensions. Because the IA architecture has variable-length instructions, Atom uses an algorithm that’s able to tag instructions with an end-of-instruction marker after a pass through the decoder. The next time the instruction is fetched, you have an indication of where the instruction ends, yielding better performance through the decoder. Atom’s branch predictors are much simpler as well, since they’d otherwise eat up too much of the power budget. The take-away is that Intel made some sweeping changes to the way IA instructions are decoded in an effort to maximize decoder efficiency and reduce power consumption.
Atom is also the first processor since Intel’s Pentium 4 to feature simultaneous multi-threading (SMT) in the form of Hyper-Threading. By design, in-order execution engines spend more processor clocks to execute instructions. Normally that’d be a performance inhibitor. However, Intel saw opportunity there. When execution is stalled, say, waiting for memory, pipeline resources go unutilized. By adding SMT, performance goes up as a second thread keeps instructions flowing through the engine. According to Intel’s Kuttanna, the performance gains seen from Atom’s in-order architecture with SMT are higher than an out-of-order design. Of course, realizing the benefits of SMT on an Atom-based device requires threaded software. The good news is that many audio and video codecs are already coded to employ threaded architectures. Multi-tasking will also showcase Atom’s ability to juggle more than one thread.
Moving into the FP/SIMD execution clusters, Atom features two single-cycle SIMD ALUs, one of which is equipped with a shuffle unit. The other supports a full-width floating point adder for single precision FP adds. Why the emphases on SIMD performance in the execution cluster? In profiling the apps typical of an MID, it became clear to Intel’s team that it’d see significant gains from building a wide data path.
In addition to its 32KB instruction cache, Atom sports a 24KB writeback data cache. The memory execution cluster also boasts a dual-level TLB hierarchy - one buffer is smaller, allowing very low latency access, and the other is significantly larger. A 512KB L2 cache with ECC support is integrated as well, able to fetch 64-byte cache lines in two clocks (that’s 256 bits per access). Onboard hardware prefetchers either pull data from memory into the L2 cache or from the L2 cache into the data cache. The processing core consists of roughly 13 million transistors and the chip’s L2 cache takes up about 30 million transistors.
Despite its ground-up design, Atom maintains compatibility with Intel’s Core 2 Duo product lineup. The architecture supports Intel Virtualization Technology, Execute Disable Bit support, 64-bit extensions and SSE3. However, Intel’s Pankaj Kedia says that not every feature will be productized in every SKU.
|Power and the Platform|
As a result of Intel’s delicate balancing between power and performance, Atom is the company’s most energy-efficient design to date. Its thermal design power falls between .65 and 2.4 watts (contingent largely on the operating frequency of the SKU in question) and Intel says average power ranges from 160 to 220 mW. At idle, you’ll see numbers closer to 100 mW.
The decisions to incorporate an in-order execution engine and simplified scheduler contribute greatly to Atom’s effective power management. However, the CPU also includes a handful of specific power-saving technologies that keep consumption to a minimum.
One of the most effective is a new C6 power state, first introduced as a feature of the mobile Penryn family. You’re probably familiar with some of the other C-states, such as C1E, where the core clock is turned off, the L1 caches are flushed, voltage is lowered, and power draw is reduced considerably. C6 is significantly more aggressive, shutting off the core clock, PLLs, and caches. Voltage consequently drops to near-zero and consumption is kept to an absolute minimum. Information normally stored in the registers (what Intel calls the architectural state) is saved to a small on-chip buffer. Waking back up from C6 happens, in turn, very quickly. For all but one of the launch SKUs, idle power in the C6 state is 100 mW.
The C6 state is enabled, in part, by a split power plane. Under full utilization, all 203 of Atom’s I/O pins are active across both planes. In its C6 state, 21 pins continue receiving power from the 1.05V VRM over the one plane still powered on while the remaining shut off. Only the circuitry needed to wake Atom out of C6 continues getting power. Hence the low draw from an Atom processor as it idles.
Clock gating is another important tool Intel uses to ensure only the logic gates that need power are getting it. The power-saving technique isn’t new in the synchronous circuit business - it’s been a part of Intel’s mobile processors and discrete notebook GPUs since the Pentium 4 days. Atom is simply more aggressively optimized for the feature.
Intel’s message of energy efficiency carries over from the processor to its platform. Like the Core 2 Duo and Quad CPUs, Atom communicates over a front side bus. As a percentage of total power consumption, the I/O pins on Intel’s FSB seem excessively high because the processor is so efficient relative to past designs.
The chipset on the other end of Atom’s front side bus is actually a single component consisting of functionality normally found on north and south bridges. Internally referred to as Poulsbo, the chip includes a capable integrated graphics engine and the I/O expansion of a mainstream desktop.
Built from scratch like the Atom processor, Intel started development of Poulsbo in 2005. The company has been working with its software partners since then to ensure the most popular video players and codecs are all compatible with the chipset’s built-in hardware video acceleration. Pankaj Kedia points out that when he hops on YouTube with an iPhone, only the videos encoded in H.264 are available through Apple’s widget. Centrino Atom aims to change the need for “workarounds” on mobile devices. When you get online with a MID, you won’t be constrained by the proprietary nature of many of today’s solutions.
Integrated graphics is another important part of the Poulsbo story. The Centrino Atom platform was designed to accommodate a number of modern operating systems. And while Intel has standardized on the open source software projects hosted at moblin.org, you can expect at least a handful of OEMs to tap Vista as their operating system of choice. As a result, support for DirectX 9 and OpenGL are equally valuable to the core logic. Don’t expect breathtaking 3D performance from the core - Intel rates the theoretical maximum fill rate at 400 Mpixels/s - but that should be sufficient for the types of apps expected to run on MIDs.
Poulsbo’s I/O consists of two PCI Express x1 ports, eight USB 2.0 host ports (one of which can be configured as a client port), three SDIO/MMC ports, support for up to 1GB of DDR2 memory, and an parallel ATA IDE controller. The idea, of course, is to facilitate PC-like capabilities in a miniaturized form factor. When OEMs start adding WiMAX and 3G support, they’ll use the System Controller Hub’s external connectivity to interface with the platform.
|Launch SKUs and the Future|
The first round of Atom processors will launch at clock frequencies between 800 MHz and 1.86 GHz. All models include 512KB of L2 cache on the same sub-25 square millimeter die. But the two entry-level offerings will run on a 400 MHz front side bus, while the three higher-end versions employ a 533 MHz FSB. Those same three models also feature Hyper-Threading.
As you’d expect, power consumption rises as you ascend the family’s hierarchy, so OEMs will have to continue weighing the thermal performance of these solutions against the size of their devices. Even at 2W, Atom is still too hot for super-slim phones. As much as we would have liked to hear of an iPhone product with this technology, that’ll have to wait at least another year or two when Intel launches its next-generation platform.
That platform, currently known as Moorestown, will be a System on Chip design and Intel’s first foray into the smart phone arena. There aren’t any really solid details on the hardware available yet, aside from an acknowledgement that Intel will use 45nm manufacturing and still reduce idle power by up to 10 times. However, Intel’s Anand Chandrasekher stood on stage at the fall IDF in San Francisco with an example of the type of device Moorestown will power and the potential appears to be stunning. For now, we’ll have to be contented with the precursors to those pocket-sized do-it-all products.
While the hardware Intel is introducing takes front and center, the company is also talking about its immediate plans to support Centrino Atom with a software infrastructure. Linux naturally lends itself to the ultra-portable market because of its cost, footprint, and lower system requirements. The problem with it, according to Pankaj Kedia, is that it’s fragmented. Intel’s solution is to unify the software under its Moblin open source project. Not only does Moblin address the low-cost, light weight operating system for Intel’s MIDs, but it also serves as an umbrella for the optimized applications developed to run on Centrino Atom.
Intel says it’s already talking to about 25 vendors who have the hardware, have the software, and are planning products. In the June timeframe, at least 10 of those should be ready or very close to it. Among the list of OEM partners, Intel showed us Lenovo’s entertainment device, a Toshiba handheld running Vista, an LG model also running Vista with a slide-out keyboard, Gigabyte’s offering, a touch-screen system from ASUS, Clarion’s mobile navigation system, and a BenQ handheld.
The MID market has been on Intel’s radar since 2005, when it set the ball in motion for today’s introduction. The company’s first step into the handheld Internet space was a bit timid with McCaslin. The Menlow/Centrino Atom is significantly more deliberate. But will the technically proficient start packing portable video players and Internet-enabled tablets in addition to their keyboard-equipped phones?
That’s going to be a hard sell, especially since there are a growing number of mobile convergence devices that offer video playback, voice communications, and Internet access in one place. More probable is Intel’s success with Centrino Atom in specialized markets. For instance, car audio manufacturer Clarion introduced its internally-named MiND Internet-enabled navigation device at this year’s CES. Currently slated for availability this year, the MiND will have an 800x480 touch screen display, a Centrino Atom platform 256MB of memory, 4GB of solid state storage, WiFi, Bluetooth 2.0, and a GPS receiver. In the future, Clarion plans to add a 3G data module and WiMAX support to round out broadband Internet connectivity.
That last point is particularly interesting. Recall that in order to qualify for Centrino branding, a notebook must include Intel’s processor, chipset, and wireless networking module, be that the PRO/Wireless 3945ABG card or the WiFI Link 4965AGN adapter. With Centrino Atom, mobile devices will feature a combination of WiFi, 3G, and WiMAX sourced through Intel or a third party. Hopefully that means more innovation from OEMs building sexy new handhelds.
Alright, so we’ve already peppered this first-look with a few of our initial reactions. But as we wrapped up our conversation with two of the great minds behind Intel’s developing MID initiative, we were presented with a couple of examples why the company’s current approach makes so much sense. We mentioned one at the beginning of this piece - Adobe’s 160 versions of Flash, which are needed to support the many permutations of incompatible hardware and software. The second is Skype. Sony added the software to its PSP, but that port didn’t work for the Sony Mylo Personal Communicator, necessitating another development effort. Nokia ported Skype for its 810. Of course, it doesn’t work on the N95.
The beauty of Atom, at least in theory, is that its x86 compatibility means the architecture works with the software already available for Windows or Linux. In other words, hitting YouTube on a Centrino Atom MID won’t limit you to the videos encoded in H.264.
That’s not to say loading bloated x86 binaries on Centrino Atom devices will be the best route to take. We already know that the in-order execution engine and simplified scheduler, used to keep Atom’s power levels down, benefit from software optimization. So it’ll be interesting to see how existing apps are tweaked for performance on an ultra-mobile platform. Clearly, Intel is trying to help ease that transition through the Moblin project.
Certainly, the hardware sounds good, the software side is coming together, and Intel’s efforts in unifying a fragmented market full of incompatibility will be well-received. Will enthusiasts jump at the chance to add another digital device to their pockets? Perhaps not in the numbers Intel would like to see. But we’re still convinced this is a precursor to something bigger. Intel says it wants to put the Internet in your pocket. That will happen when it ties in the functionality of the phone you’re already toting. It’s just amazing that, eight years after the Pentium 4 first surfaced, we’re watching Intel packing a more complex processor into a package nearly one tenth of the size.