AMD Lifts The Hood On Bulldozer At ISSCC
According to AMD Fellow Tim Fischer, Bulldozer was designed from the ground up to reduce power consumption. He writes:
Changes in clocking, latching, power management and on-chip memories are part of the comprehensive circuit updates incorporated into Bulldozer. These are detailed in the paper, along with significant power reduction improvements, including clock gating, a new low-power flop design, and L1 cache power improvements.
AMD's high-level Bulldozer diagram, showing the independent integer units and shared FPU
These changes should have a significant impact on AMD's entire product line. Both desktop and Opteron-based chips will be more finely tuned. This should allow AMD to better meet the needs of both ultra-low power and high-performance market segments. As for the chip's performance, Fischer states:
High performance computing relies heavily on vector (packed integer) and floating point operations, both handled in the FPU. Bulldozer was designed to execute these operations at higher performance and using less power than the current generation of microprocessors. Key to Bulldozer’s performance and power improvements are FPU changes, including completely redesigned arithmetic units and control structures. As previously described... the Bulldozer FPU supports new instructions including SSSE3, SSE4.1, SSE4.2, AVX, AES, and advanced Multiply-Add/Accumulate operations.
The lower-level, more detailed diagram. AMD claims that each Bulldozer module contains "two tightly linked processor cores." In reality, the module contains something like 1.5 - 1.8 CPU cores which are linked together and operate using chip-multithreading, or CMT.
Just The Facts
Based on the data AMD has released to date it's clear that Bulldozer will offer better power management and performance than what we've previously seen in K10. More importantly, at least for AMD's profit margin, it offers these advantages on a significantly smaller die than a traditional quad core. This means AMD can build more processors on a single wafer, thus lowering its cost of goods sold.
We can safely assume that Bulldozer will outperform Phenom II and current Opterons both in terms of performance-per-watt and raw performance. There may be a few corner cases where the "true" quad-cores outperform Orochi thanks to their individually dedicated FPUs, but Bulldozer's additional features and higher clockspeeds should close the gap.
The bigger question, of course, is whether or not Bulldozer will be able to close the gap with Sandy Bridge. Later presentations at ISSCC may offer insight on this topic, but for now we're still mostly in the dark. It stands to reason that Bulldozer will at least shrink the gap—but how much is still unknown.