AMD Lifts The Hood On Bulldozer At ISSCC

AMD's upcoming Bulldozer processor has been an increasingly hot topic as its launch date creeps nearer, but the company has kept a great deal of information under wraps. That's now beginning to change; AMD plans to discuss Bulldozer more in-depth at the ISSCC conference currently underway.

According to AMD Fellow Tim Fischer, Bulldozer was designed from the ground up to reduce power consumption. He writes:
Changes in clocking, latching, power management and on-chip memories are part of the comprehensive circuit updates incorporated into Bulldozer.  These are detailed in the paper, along with significant power reduction improvements, including clock gating, a new low-power flop design, and L1 cache power improvements.

AMD's high-level Bulldozer diagram, showing the independent integer units and shared FPU

These changes should have a significant impact on AMD's entire product line. Both desktop and Opteron-based chips will be more finely tuned. This should allow AMD to better meet the needs of both ultra-low power and high-performance market segments. As for the chip's performance, Fischer states:
High performance computing relies heavily on vector (packed integer) and floating point operations, both handled in the FPU.  Bulldozer was designed to execute these operations at higher performance and using less power than the current generation of microprocessors.  Key to Bulldozer’s performance and power improvements are FPU changes, including completely redesigned arithmetic units and control structures. As previously described... the Bulldozer FPU supports new instructions including SSSE3, SSE4.1, SSE4.2, AVX, AES, and advanced Multiply-Add/Accumulate operations.

The lower-level, more detailed diagram. AMD claims that each Bulldozer module contains "two tightly linked processor cores." In reality, the module contains something like 1.5 - 1.8 CPU cores which are linked together and operate using chip-multithreading, or CMT.

AMD has released a block diagram with more information on it than we've previously seen. The company has also hinted that Orochi will be capable of operating at 3.5GHz at launch, though it's not clear if this is an average clockspeed or the highest speed the company is targeting. The CPU designer claims that Bulldozer "improves performance and frequency while reducing area and power over a previous AMD x86-64 CPU in the same process. The design reduces the number of gates/cycle relative to prior designs, achieving 3.5GHz+ operation."

Just The Facts

Based on the data AMD has released to date it's clear that Bulldozer will offer better power management and performance than what we've previously seen in K10. More importantly, at least for AMD's profit margin, it offers these advantages on a significantly smaller die than a traditional quad core. This means AMD can build more processors on a single wafer, thus lowering its cost of goods sold.

We can safely assume that Bulldozer will outperform Phenom II and current Opterons both in terms of performance-per-watt and raw performance. There may be a few corner cases where the "true" quad-cores outperform Orochi thanks to their individually dedicated FPUs, but Bulldozer's additional features and higher clockspeeds should close the gap.

The bigger question, of course, is whether or not Bulldozer will be able to close the gap with Sandy Bridge. Later presentations at ISSCC may offer insight on this topic, but for now we're still mostly in the dark. It stands to reason that Bulldozer will at least shrink the gap—but how much is still unknown.