HptHardware HotHardware
Search View Full site
AMD's Next Gen Steamroller CPU Could Deliver Where Bulldozer Fell Short
Today at the Hot Chips Symposium, AMD's CTO Mark Papermaster is taking the wraps off AMD's upcoming CPU core, codenamed Steamroller. Steamroller is the third iteration of Sunnyvale's Bulldozer architecture and an extremely important part. Bulldozer, launched just over a year ago, was a major disappointment. The company's second-generation Bulldozer implementation, codenamed Piledriver, made a number of important changes and was incorporated into the Trinity APU family that debuted last spring.

Steamroller is the first refresh of Bulldozer's underlying architecture and may finally deliver the sort of performance and efficiency AMD was aiming for when it built 'Dozer in the first place. In the slides below, all of the comparisons and percentage gains are based on Trinity.

With Steamroller, AMD is taking a baby step or two back towards the traditional dual-core model. Here's Bulldozer's Fetch/Decode/Dispatch hardware, as compared to Steamroller's.


Bulldozer and Steamroller Fetch and Decode Architecture

One of Bulldozer's limitations was that it could only decode four instructions per module for a maximum of 16 instructions per clock in a four module / eight core configuration.That put the chip at a theoretical disadvantage compared to Istanbul (3 instructions/core, 18 total in a six-core configuration) and Sandy Bridge (4 instructions/core, 32 total in an eight-core CPU).

It's not clear if Steamroller can actually dispatch more instructions per clock, but a pair of dual-issue dispatch units may be quicker than the single, unified logic block Bulldozer used. BD's unified approach reduced multithreading performance by ~20% compared to a traditional dual-core. Given how much logic the chip shared, a 20% performance penalty isn't bad -- but reducing this penalty is a great place for AMD to recover performance.



Johan DeGelas' excellent in-depth article on Interlagos performance revealed that the L1 instruction cache had taken a nasty efficiency hit compared to older Istanbul-based chips. With both cores per module enabled, L1 hitrate had fallen to 95% from 97% (the mispredict rate nearly doubled, in other words).  AMD is "increasing" L1 instruction cache size to compensate -- presumably to 96-128K per module, from 64K in Bulldozer. A 30% reduction in i-cache misses would put the L1 hit rate back in 96-97% territory.

Steamroller L1 Cache and Integer Scheduler Improvements
Steamroller L1 Cache and Integer Scheduler Improvements

Interlagos' branch predictor was better than Istanbul's, but still significantly worse than Intel's. A 20% improvement here won't put AMD and Intel on equal footing, but it will boost Bulldozer's overall performance.



It's not clear what AMD means by "streamlined execution hardware." Typically that's execu-speak for "We got rid of some stuff," but that may not be a problem here. Sunnyvale is pushing the idea that the GPU effectively becomes the floating-point heavy lifter at some point in the not-too-distant future, and strong FPU performance isn't really driving adoption in any segments where AMD can reasonably expect to compete.

Putting It All Together:

Based on what we know now, Steamroller looks a lot like the CPU Bulldozer should've been. AMD is claiming a 15% performance/watt improvement, and that figure makes sense given what we've seen today. The good news is that another 15% definitely moves things forward for AMD. Trinity's major achievement was its ability to deliver Llano-equivalent performance at moderately less power; Steamroller should finally pull ahead of the old K10 architecture in clock-to-clock efficiency. That's critical -- AMD needs to strengthen its single-thread performance if it wants to compete with Intel in mobile markets.

The downside is that another 15% won't really change competitive positioning. Steamroller's raw performance may match Sandy Bridge, but it's unlikely to compete well against IVB or Haswell. This suggests that AMD's ability to gain share in mobile will continue to be performance-constrained. With that said, Steamroller is still hugely important -- it's shaping up to be the first real example of what AMD wanted to accomplish when it opted for CMT (Chip Multi-Threading) architecture.



Timing will be critical. Sunnyvale hasn't said when it expects Steamroller to ship beyond a broad "2013" target; an early launch window is infinitely preferable to allowing the core to slip into the back half of 2013. Right now, AMD has made no statements on the Kaveri SoC's launch timeframe (Kaveri is the first APU to integrate a Steamroller core). Sunnyvale's last public roadmap update, last February, indicates that Steamroller won't launch in an independent CPU flavor -- at least not in 2013. The Piledriver core at the heart of Trinity remains the top product in the company's lineup.

If it can launch ahead of Haswell, AMD has a chance to focus the conversation on its cycle of continuous, rapid improvements rather than being defined as an Intel also-ran. Hopefully we'll be able to glean more information from the company's presentations and whitepapers at Hot Chips, but Steamroller is a strong start.

arrow32 Comments

  1. Erakith says:

    Oooh I am interested. This is looking nice, good luck to AMD.. we need competition in the marketplace.

  2. I hope so. Competition would be awesome!

  3. rapid1 says:

    Nice to see them still rolling along I had kind of given up on them after being a die hard for many many years (Since Athlon XP if anyone remembers those)!

  4. realneil says:

    I remember my old Thunderbird CPUs and how quick they were.

  5. Erakith says:

    I had an Ath x64 3100.. nice little chip at the time.

  6. Schmich says:

    Seems they really have given up on really competing seriously on enthusiast side of desktops. Considering that the desktop CPU-only chips will always be a generation behind :/

    I don't know the details about chip-making. But when you think about all the AMD people out there who want the 8120-8150 of Piledriver and later Steamroller. It could make sense to at least just make one model of each generation and release asap. So lots of APUs and one high-end CPU so enthusiasts don't flee to Intel.

  7. nicoletoledo says:

    This would be a great comeback. If they can pull thru. Im still an amd fan because of their price performance ratio.

  8. KBennett says:

    The problem is gonna be Windows 8. For those that don't know the FP light "half core" design they used in Bulldozer and Steamroller has a serious performance issue in WinXP, Vista and 7 and MSFT has made it clear its a WILL NOT FIX except...in Windows 8.

    So for all of you that don't want Metro? I'm afraid you should either avoid AMD or disable half of each module so it behaves like a "normal" chip because otherwise Windows treats it like hyperthreading which is bad. For those that don't know WHY its bad, imagine you buy a Steamroller 8 core. Now the best way for Windows to schedule 4 jobs would be ONE per module, that gives each of the 4 jobs its own FP unit. Instead thanks to the scheduler bug Windows will dump those 4 jobs on the first two modules, slowing them to a crawl, while the other two modules twiddle their thumbs...see the problem?

    So until AMD gets rid of the half core design, or gets MSFT to backport the fix, I'll be hanging onto my Thuban for another couple of years and then if it isn't fixed going to Intel. Because I don't kn ow about everyone else but I have NO desire to turn my desktop into a cell phone with Win 8's Metro UI.

  9. mhenriday says:

    Hold with those above who are hoping that Steamroller will perform sufficiently well to make it an alternative to Intel's Sandybridge and Ivybridge - and other chips in the pipleline - the x86 chip market desperately needs competition ! But even if AMD does well, two chip competitors are far too few ; while an oligopoly is preferable to a monopoly, what is needed are new players in the market, like ARM in the low-power segment. Still, the return of AMD as a serious competitor would be a most welcome event ; in my next build I look forward to being able to replace my current AMD Phenom II X4 955 processor with a hot (and reasonably priced) CPU from AMD, rather than having to pay an Intel tax due to that company's quasi-monopoly position....

    Henri

  10. JOMA says:

    Last AMD processor I owned was the amd athlon 3200 socket 939. Fantastic chip but there hasnt been anything recently from AMD that has swayed me from Intel.

  11. Load more entries...