AMD's Next Gen Steamroller CPU Could Deliver Where Bulldozer Fell Short

AMD's Next Gen Steamroller CPU Could Deliver Where Bulldozer Fell Short

Today at the Hot Chips Symposium, AMD's CTO Mark Papermaster is taking the wraps off AMD's upcoming CPU core, codenamed Steamroller. Steamroller is the third iteration of Sunnyvale's Bulldozer architecture and an extremely important part. Bulldozer, launched just over a year ago, was a major disappointment. The company's second-generation Bulldozer implementation, codenamed Piledriver, made a number of important changes and was incorporated into the Trinity APU family that debuted last spring.

Steamroller is the first refresh of Bulldozer's underlying architecture and may finally deliver the sort of performance and efficiency AMD was aiming for when it built 'Dozer in the first place. In the slides below, all of the comparisons and percentage gains are based on Trinity.

With Steamroller, AMD is taking a baby step or two back towards the traditional dual-core model. Here's Bulldozer's Fetch/Decode/Dispatch hardware, as compared to Steamroller's.


Bulldozer and Steamroller Fetch and Decode Architecture

One of Bulldozer's limitations was that it could only decode four instructions per module for a maximum of 16 instructions per clock in a four module / eight core configuration.That put the chip at a theoretical disadvantage compared to Istanbul (3 instructions/core, 18 total in a six-core configuration) and Sandy Bridge (4 instructions/core, 32 total in an eight-core CPU).

It's not clear if Steamroller can actually dispatch more instructions per clock, but a pair of dual-issue dispatch units may be quicker than the single, unified logic block Bulldozer used. BD's unified approach reduced multithreading performance by ~20% compared to a traditional dual-core. Given how much logic the chip shared, a 20% performance penalty isn't bad -- but reducing this penalty is a great place for AMD to recover performance.



Johan DeGelas' excellent in-depth article on Interlagos performance revealed that the L1 instruction cache had taken a nasty efficiency hit compared to older Istanbul-based chips. With both cores per module enabled, L1 hitrate had fallen to 95% from 97% (the mispredict rate nearly doubled, in other words).  AMD is "increasing" L1 instruction cache size to compensate -- presumably to 96-128K per module, from 64K in Bulldozer. A 30% reduction in i-cache misses would put the L1 hit rate back in 96-97% territory.

Steamroller L1 Cache and Integer Scheduler Improvements
Steamroller L1 Cache and Integer Scheduler Improvements

Interlagos' branch predictor was better than Istanbul's, but still significantly worse than Intel's. A 20% improvement here won't put AMD and Intel on equal footing, but it will boost Bulldozer's overall performance.



It's not clear what AMD means by "streamlined execution hardware." Typically that's execu-speak for "We got rid of some stuff," but that may not be a problem here. Sunnyvale is pushing the idea that the GPU effectively becomes the floating-point heavy lifter at some point in the not-too-distant future, and strong FPU performance isn't really driving adoption in any segments where AMD can reasonably expect to compete.

Putting It All Together:

Based on what we know now, Steamroller looks a lot like the CPU Bulldozer should've been. AMD is claiming a 15% performance/watt improvement, and that figure makes sense given what we've seen today. The good news is that another 15% definitely moves things forward for AMD. Trinity's major achievement was its ability to deliver Llano-equivalent performance at moderately less power; Steamroller should finally pull ahead of the old K10 architecture in clock-to-clock efficiency. That's critical -- AMD needs to strengthen its single-thread performance if it wants to compete with Intel in mobile markets.

The downside is that another 15% won't really change competitive positioning. Steamroller's raw performance may match Sandy Bridge, but it's unlikely to compete well against IVB or Haswell. This suggests that AMD's ability to gain share in mobile will continue to be performance-constrained. With that said, Steamroller is still hugely important -- it's shaping up to be the first real example of what AMD wanted to accomplish when it opted for CMT (Chip Multi-Threading) architecture.



Timing will be critical. Sunnyvale hasn't said when it expects Steamroller to ship beyond a broad "2013" target; an early launch window is infinitely preferable to allowing the core to slip into the back half of 2013. Right now, AMD has made no statements on the Kaveri SoC's launch timeframe (Kaveri is the first APU to integrate a Steamroller core). Sunnyvale's last public roadmap update, last February, indicates that Steamroller won't launch in an independent CPU flavor -- at least not in 2013. The Piledriver core at the heart of Trinity remains the top product in the company's lineup.

If it can launch ahead of Haswell, AMD has a chance to focus the conversation on its cycle of continuous, rapid improvements rather than being defined as an Intel also-ran. Hopefully we'll be able to glean more information from the company's presentations and whitepapers at Hot Chips, but Steamroller is a strong start.
0
+ -

I think, realneil, that the main point of the article was to show that frame rate is not an adequate measurement of the gaming experience and that frame latency is a superior metrological instrument for this purpose. Thus it has - if I understand it aright - a wider relevance than just the current situation obtaining between Intel and AMD CPUs (which I hope will at least in part be rectified by the Steamroller series) ; it teaches us - or at least should teach us - not to stare ourselves blind on frame rates. If more reviewers adopt these methods and the general public (or at least its enthusiast component) becomes aware of these facts, perhaps manufacturers will be forced to provide us with better processors (but they'd better not have rounded corners !)...

Henri

0
+ -

mhenriday:
I think, realneil, that the main point of the article was to show that frame rate is not an adequate measurement of the gaming experience and that frame latency is a superior metrological instrument for this purpose.

And the results of their testing was damn close to standard testing methodologies anyways. Intel still rules the roost and AMD CPUs don't.

My point was that (leaving benchmarks of any sort out of it) both of my AMD gaming CPUs, the 980 and the 4170 deliver a good 'real world' gaming experience if you have a decent GPU in the system. People keep pointing to benches and they ridicule AMD's efforts, but I say that they're OK with me.

0
+ -

I think many (but, it would seem, not all) participants in this discussion share your hope, RMadatyan, that AMD gets back in the game with respect to high-end CPUs - not merely because of the effect that this would exert on prices, but also because it would stimulate innovation. Intel, for example, would hardly feel the same pressure to improve their products if AMD weren't around (and vice versa). For my part, I also hope that when testing CPU performance in a gaming context, the present somewhat excessive emphasis on frame rates (surely there's a limit above which an increase in frame rates provides no noticeable improvement in user experience, although just where that limit goes is open to debate) is toned down and that other considerations, like frame latency are taken into account. In any event, I'm greatly looking forward to Steamroller - if it does the job and the price is reasonable, I'll consider using one of the versions in a new build, even though I hardly need it - my trusty Phenom II X4 955 does everything I ask of it....

Henri

0
+ -

as for me I have the I7 and I have a FX6100 and sorry to say but my amd fx6100 runs alot better than my i7. the I7 locks up all the time and my amd never slows down. I replaced my I7 thinking it was the processor being bad but my new one does the same thing

0
+ -

judzwho:
the I7 locks up all the time

Your i7 has configuration problems.

They usually work without any hiccups.

Mine seems to be bulletproof. (it's like the Energizer Bunny)

judzwho:
I replaced my I7 thinking it was the processor being bad but my new one does the same thing

It usually isn't the CPU that goes. The mainboard or the PSU are usually the culprits.

0
+ -

I'm still using a phenom x4 9750, currently paired with a Radeon HD 6870 and up until the last year or so, I've been able to run pretty much every game at playable framerates and resolutions. In fact, I have more issues with the drivers for my GPU (mostly on Linux but I've had problems updating the Windows drivers) Before it stopped being able to run new games, my only gripes with my cpu are that the temp sensors don't work on Linux. And the linux driver for the sound part of the chipset doesn't play all that well with PulseAudio and Skype unless I add "tsched=0" into the config which screws up flash (the chrome version, the now unsupported version stretches in fullscreen mode if you have dual monitors.

Bottom line: I will be getting the Steamroller equivalent of the fx-8350 and when combined with a decent Nvidia GPU a little later down the line, my system will rock both in Linux and Windows.

0
+ -

Well,

I recently got myself an FX-8350... And I've heard outcries from people about crysis 3 putting a lot of weight onto i7's and making framerates drop...

You know why this is? i7 is quad; fx-8350 is 8-core.

Here's a video of me on Crysis 3; STREAMING (at high quality stream settings, so extra CPU load) on maxed out graphics: http://www.youtube.com/watch?v=KnQNY5BsMWM

Framerate drops are in this situation caused by my graphics card (it being GTX 550Ti)

I even captured my CPU activity at the time too (bottom right corner).

FX-8350 is my hero. :3

Prev 1 2
Login or Register to Comment
Post a Comment
Username:   Password: