Next-Gen AMD Bobcat and Bulldozer CPU Deep Dive

Name: Next-Gen AMD Bobcat and Bulldozer CPU Deep Dive
Brand: Next-Gen

by Joel Hruska — Wednesday, August 25, 2010, 04:24 AM EDT

Yesterday, we chided AMD for its decision not to reveal more details about Bulldozer and Bobcat, but it turns out we didn't have all the facts. AMD was planning on disclosing more information later in the day at Hot Chips—but the company failed to disclose that before we went live with our previous coverage. We're going to take a look at the new information about Bobcat and Bulldozer that's subsequently been revealed; if you want more general background data, check the links above.

Bully For Bobcat
We'll start with the high-level block diagram of Bobcat's architecture, then step through some of the pertinent details. Bobcat shares certain characteristics with Atom but AMD's new low-power processor is designedcach to meet a very different set of criteria. As we've previously discussed, Bobcat is an out-of-order core—a feature it shares with all modern microprocessors. This fact alone virtually guarantees that Bobcat will outperform Atom clock-for-clock, but it also implies the chip will use more power.

Bobcat features a 64K L1 cache (32K instruction / 32K data) and 512K of L2 cache per core—a dual-core Ontario processor will feature a total of 1MB L2 cache. Bobcat's brand predictor is depicted as being "state of the art," a claim that's hard to parse without additional information. One new feature, however, is that the branch predictor shuts down whatever units aren't in use in order to reduce overall power consumption.

Bobcat's decoder (above, in red) takes a page from Intel's playbook. Like Atom, it's a dual-issue design that focuses on instruction efficiency. Modern x86 processors don't actually execute x86 instructions. Ever since the Pentium Pro, all microprocessors have translated x86 instructions in micro-ops before performing any calculations. When it designed Atom, Intel opted not to decode most x86 instructions into micro-ops and instead combined multiple instructions into single micro-ops.

AMD claims that Bobcat can "directly map 89 percent of x86 instructions to a single micro-Op, an additional 10 percent to a pair of micro-ops, and more complicated x86 instructions (<1%) are micro-coded." Intel quoted very similar figures when it unveiled Atom's decoder, but there's likely to be some subtle differences in the capabilities of the two cores.

We aren't going to go into great detail on Bobcat's ALU and FPU units (the yellow-orange and turquoise blocks), but they're structured similarly to what you'd expect to find in a higher-end core like Shanghai. When Intel built Atom, it chose to include as few execution units as possible in order to save power. AMD isn't so willing to sacrifice performance.

Bobcat's 512K L2 is 16-way set associative, ECC Protected, and uses "half-speed clocking for power reduction." It's not clear if this means the L2 cache only runs at 50 percent of core clockspeed by default, or if the cache downclocks itself when it's not much in use.

The new core's pipeline looks like this:

That's 15 stages in total—of the six fetch stages, three of them are used by the branch prediction unit. AMD is mum on the reason, citing competitive concerns. Again, this matches up to Atom's 16-stage pipeline, as do Bobcat's cache latencies. L1 cache latency is 3 cycles and L2 is 17 cycles. Finally, we've got new information on what power saving technologies AMD adopted with Bobcat.

Use of physical register files
Extensive use of non-shifting queues with pointers
Fine grained clock gating
Integrated Core Power Gating
Only needed arrays are clocked
Elimination of instruction market bits in the I-cache
Finding the knee of the curve (scrutinize performance gains against power costs
Polishing speed paths to raise the Vt mix and reduce leakage

Why We're Excited
By the time Bobcat arrives, Atom will have had the netbook market almost entirely to itself for 2.5 years. Where we once hoped VIA's Nano would introduce competition in the market, it now seems all but certain that AMD will be the first company to do so. We'd be remiss if we didn't note that Ontario actually won't compete with Atom in a large number of markets; Atom was designed specifically to scale into handheld devices and power envelopes Bobcat won't be able to reach. Where the two chips do meet, however, we expect Ontario will outperform Atom.

On the graphics side of things We'd love to think that 2011 is the year Intel will dazzle us all with a brilliant new Atom-ready GPU, but that's not likley to happen. Interestingly, we might see Intel change its tune about NVIDIA's ION if it feels AMD's future integrated solution is hitting a weak spot in Atom's armor.

Bobcat hasn't gotten as much attention as Bulldozer, but we think the low-power chip is much more likely to have an effect on AMD's bottom line and market share in the next 12 months. If Sunnyvale targets it properly, it could deliver much higher performance than the netbook market is used to at an extremely attractive price. AMD's Brazos platform won't single-handedly rejuvenate AMD's mobile division, but it could change what people expect from a netbook or a low-end notebook.