New Details Emerge On Microsoft's XBox 360 "Fusion" Processor

rated by 0 users
This post has 13 Replies | 1 Follower

Top 10 Contributor
Posts 26,396
Points 1,192,640
Joined: Sep 2007
ForumsAdministrator
News Posted: Fri, Aug 27 2010 9:24 PM
One of the ways that Sony, Microsoft, and Nintendo gradually reduce the price of consoles is by adopting increasingly advanced manufacturing technologies that reduce the size and power consumption of the console's components. Microsoft has combined both the CPU and GPU inside the XBox 360 S into a single chip configuration; details on the new design are starting to leak.

The XBox 360 is an infamous example of the problems manufacturers can encounter when pushing the performance envelope—the initial XBox consoles weren't designed with sufficient thermal tolerances which lead to the infamous RROD (Red Rings of Death). It's been several years since Microsoft patched that particular problem, but the new XBox 360 S marks the first time the console's Xenon CPU and ATI-built GPU have been built on the same die. What makes this particular fused processor so interesting is the inherent difficulty of its creation. When AMD and Intel set out to design their single-chip hybrids, they control all the IP for both parts and have experience in designing modular components. For IBM and Microsoft, collaborating on the new XBox 360 processor wasn't so simple. In order to build the two designs on a single die, IBM had to first flip its core design 90 degrees become intimately familiar with the ATI-designed GPU core, and then build a fused core that exhibited exactly the same latencies, intra-chip communication speed, and real-world performance.



That's where the "FSB Replacement" block in the diagram above comes in. Ordinarily, a CPU designer would crow over the latency advantage inherent to moving additional components on-die. In this case, however, the introduction of any change—even a change that should improve performance—could be detrimental to a game. Unlike in computing, games are often coded 'to the metal,' which means they're written to take specific advantage of unique functions (and often errata) of a particular type of processor. Ironically, this means that preserving the bugs of a chip when moving from one process technology to another is critical. To that end, there's the FSB replacement, whose job it is to introduce the appropriate latencies. Note that while the CPU and GPU are built into a single chip, the core's 10MB of EDRAM actually sits on package. This almost certainly reduces cost; building 10MB of on-package cache would've significantly increased die size and forced IBM to throw out an entire core if the 10MB chip was flawed.

Compared to the original XBox 360, the new chip uses 60 percent less power and 50 percent less space. Given the fact that MS and Sony have both implied their consoles have a multi-year future ahead of them, we could see at least one more die shrink to 28nm technology before this generation runs out of life. For an unrelated (but timely) bit of news on how subtle differences between consoles can impact game performance, Shacknews has a recent story on how gamers using certain older XBox 360's ran into stuttering problems and slow cutscenes when watching the intro to Shank.  According to the game developers:

"The first is that it appears that older Xbox 360's have slower disc access rates, so the opening cinematic chugs, and loading between levels takes longer - when we tested this on development kits, this never came up so that took us completely by surprise, and we're looking into it. Note that the actual game experience is not affected in any way."
  • | Post Points: 155
Top 10 Contributor
Posts 5,053
Points 60,715
Joined: May 2008
Location: U.S.
Moderator
3vi1 replied on Sat, Aug 28 2010 9:19 AM

>> this means that preserving the bugs of a chip when moving from one process technology to another is critical.

This seems like the worst possible way to solve this problem. The XBox has a hard-drive and update capabilities. They should have made the new design as efficient as possible, then just deploy software patches for any game that actually suffers.

I highly question how many games would suffer from a latency change. Almost all companies port to/from a PC, and writing code that relies on specific hardware timings is in direct contradiction to that portability. If people are still writing code that depends on hardware latencies ala the Atari 2600, they're doing it wrong.

What part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn" don't you understand?

++++++++++++[>++++>+++++++++>+++>+<<<<-]>+++.>++++++++++.-------------.+++.>---.>--.

  • | Post Points: 5
Top 100 Contributor
Posts 1,076
Points 11,645
Joined: Jul 2009
Joel H replied on Sat, Aug 28 2010 11:11 AM

3vi1,

I've spoken to multiple people about this one. Everyone agrees it's bad practice. Officially programmers aren't supposed to do it, etc, etc.. Then everyone does it anyway. There's a very fine line between "taking advantage of the console's architecture" and "using a bug." If you're a programmer trying to scrape every last bit of performance you can out of a system, you use every trick at your disposal. This is as true for the XBox 360 as the PS3 or Wii.

Two last bits:  First, you never want to have to patch the HDD in the method you're discussing. That'll absolutely cripple performance. If it was a run-time bit of data it wouldn't hurt so much, but you'd still be altering the function of the code midflight. Guaranteed to hurt the game.

Second, you aren't thinking forward enough. A programmer doesn't want to have to write 3x code versions to exploit the unique characteristics of 3x console versions. They want to know that the code they are writing will execute exactly the same way on *every* console. If you're coding close to the metal rather than working in a high-level language, your options for auto-tuning are much more limited.

  • | Post Points: 5
Top 10 Contributor
Posts 5,053
Points 60,715
Joined: May 2008
Location: U.S.
Moderator
3vi1 replied on Sat, Aug 28 2010 5:10 PM

>> Two last bits: First, you never want to have to patch the HDD in the method you're discussing. That'll absolutely cripple performance. If it was a run-time bit of data it wouldn't hurt so much, but you'd still be altering the function of the code midflight. Guaranteed to hurt the game.

No no no... When you patch a game in this manner, the firmware simply overrides the loading of some files to point to the hard drive instead of the disc - like the way snapshots work in a VM.  Or it may read the file from the disc and apply a tiny binary patch before returning from the loading call. It has no detrimental effect on performance when the game is running.

You've probably seen how firmware updates come out to fix save game / buffer overflow exploits with particular games. Now, you know why the update doesn't fix the same exploit for every game - it's patching the the individual app to prevent it from loading beyond the proper length.

>>Second, you aren't thinking forward enough. A programmer doesn't want to have to write 3x code versions to exploit the unique characteristics of 3x console versions. They want to know that the code they are writing will execute exactly the same way on *every* console. If you're coding close to the metal rather than working in a high-level language, your options for auto-tuning are much more limited.

Unless you wrote that backwards, that was my entire point. Programming "to the metal" on consoles mostly died with the Sega Saturn. Having dabbled in console and emulator programming (Bliss32) over the years, I'm well aware of why these timing assumptions were used in ancient systems, but they really don't make sense when you're writing for modern consoles.

For one thing, the current and previous generations of consoles have all had some degree of backward compatibility. Coding "to the metal" and using hacks that only work with assumed latencies is just asking for your game not to work on the next iteration of console. It's down right stupid and inconsiderate to the consumer, actually.

What part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn" don't you understand?

++++++++++++[>++++>+++++++++>+++>+<<<<-]>+++.>++++++++++.-------------.+++.>---.>--.

  • | Post Points: 5
Not Ranked
Posts 78
Points 675
Joined: Mar 2010
Location: New York

It actually is amazing how visually close console gaming is coming to high spec PC level gaming. If you saw the RAGE demo running on a 360, you'd think so too.

  • | Post Points: 5
Top 100 Contributor
Posts 1,076
Points 11,645
Joined: Jul 2009
Joel H replied on Sun, Aug 29 2010 12:31 PM

MrBrownSound,

You've got that backwards, sadly. Since most console games are now developed for consoles and then ported to PCs, PCs are stuck with DX9 ports (possibly with some DX10/DX11 bits tacked on). That's one reason why so many games--even brand new games--are still DX9 titles for all intents and purposes.

If you want to see an example of a game developed for PC first and foremost, check the visuals and model detail in Metro 2033 running in DX11 at maximum detail. There's literally nothing on a console that can match it.

3vi1,

In this case, I'm not a programmer--I can't tell you with absolute certainty how modern games are and aren't coded. What I *do* know is that the people I've spoken to in the gaming industry have indicated they need picture-perfect CPU replicas across generations in order to maintain this level of compatibility for the reasons I've stated.

I can add one other tidbit. One of the reasons the PS3's Cell processor is so difficult to program is because writing code for it is so different than for any other chip. Cell is not a 'flexible' chip--if you want maximum performance you have to write your code in very specific ways. Because it lacks the large front-end of a traditional x86 processor, it's vital that instructions and data reach the chip at the right time.

I'm sure Sony has improved their development tools over time, but in the beginning, Kutaragi noted that they'd specifically made it hard to program for the chip so that later developers would have room to offer improved visuals/performance. Based on what I read/researched on Cell when I wrote the supercomputing article a few months ago, it seems evident that early code *was* written to metal--it was virtually the only way to model the CPU's real-world performance and figure out what worked best.

At the end of the day, we know two things:

1) Manufacturers go to extraordinary lengths to maintain exact compatibility; the "FSB Replacement" unit in the diagram above actually makes the chip *slower.*

2) The stated reason for maintaining exact compatibility is that different latencies, caches sizes, pipelines, or processing capabilities can break game compatibility.

  • | Post Points: 5
Top 10 Contributor
Posts 5,053
Points 60,715
Joined: May 2008
Location: U.S.
Moderator
3vi1 replied on Sun, Aug 29 2010 2:31 PM

Sorry for the length of this.  I really like the stuff that comes out in our discussions and I get sidetracked by shiny things:

>> What I *do* know is that the people I've spoken to in the gaming industry have indicated they need picture-perfect CPU replicas across generations in order to maintain this level of compatibility for the reasons I've stated.

I don't disagree with what you said or what people are saying to you. I understand the requirement, and I guess it makes sense in that it avoids any potential issues, but I still say the developers are writing their code wrong if it breaks without this hardware hack.

So I guess my issue is with what they're not saying to you (i.e. this is a hack to account for bad programmers). :)

Both Sony and MS employed high level emulation in their last consoles, and consoles now put out multiple resolutions - which can alter timings that such code would be dependent upon. So I'm still straining to think of a reason why code would be intentionally dependent on the latency of the FSB. All I can think of is "because of a bug in the game that's not obvious in any existing console since they all have the same timings".

>> One of the reasons the PS3's Cell processor is so difficult to program is because writing code for it is so different than for any other chip.

Yes, I've done it for fun- which is why you've seen me in an uproar here about the removal of OtherOS. What is difficult with that system is using the SPEs. The SPEs are incredibly fast, but they are incredibly crippled in many regards. Still, coders just make decisions and concessions about the tasks assigned to their threads. It's not really programming any more "to the metal" than you do for a PPU.

(BTW, this is a great 5-part guide for anyone who did not update their PS3 and wants to play with SPU programming: http://www.ibm.com/developerworks/power/library/pa-linuxps3-1/index.html)

>> Kutaragi noted that they'd specifically made it hard to program for the chip so that later developers would have room to offer improved visuals/performance.

I think Kutaragi is just mimicking what Kazio Hirai said. The problem is that when Hirai said it... he was joking. Even if Kutaragi is serious, a CEO-speak to English translator would get you "Yeah, it's hard to program. Sorry we didn't have the support libraries up to snuff by the time we released the console."

I wonder why he would want to push a revisionist's history that makes Sony a "visionary" instead of the one where they admit that they started selling the console as soon as the OS and libs were "good enough" and not necessarily finished?

Sony came out extremely lucky in this regard. Sega made the exact same mistake with the Saturn, and as a result the original games were basically only using one of the CPUs. Believe it or not, the Saturn could run computational circles around the PS1 - but you'd never know it to look at Virtua Fighter 1 vs. Battle Arena Toshinden. Things got way better when Sega released updated graphics libraries (which, right there, is an indication of how little 'to the metal' developers will actually go with any system that's reached the level of complexity of a modern console - we'd much rather have good APIs any day).

The PS1 had great library support from the very beginning, so the games destroyed the Saturn. I had both and it literally took less than an hour to get familiar with the tool chain and get something 3D on screen (thanks to the Yaroze libs and my Action Replay PCI adapter). Damn...now I want to go dig that out of the box... but I'm afraid it won't plug into my MB (don't remember if it was ISA or PCI).

Sega tried to avoid this in their next console by partnering with Microsoft. And now they're dead to the hardware industry.

While I'm waxing nostalgic:  Atari also made the same mistake with the Jaguar, and programmers were using the 68000 as the main CPU, since it was the only familiar component (it was intended to be used for sound coprocessing - hehe). But at least their CEO never came out and said "We intentionally made this thing hard to program so that later we would be out of business."

-J

What part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn" don't you understand?

++++++++++++[>++++>+++++++++>+++>+<<<<-]>+++.>++++++++++.-------------.+++.>---.>--.

Top 100 Contributor
Posts 1,076
Points 11,645
Joined: Jul 2009
Joel H replied on Sun, Aug 29 2010 6:15 PM

3vi1,

The quote I gave you from Kutaragi is at least a couple years old, I don't recall exactly. As far as coding is concerned, I'm guessing this is the problem (though you know more about coding than I do):

Neither the Xenon nor the Cell have an OoE-style front-end, which means ILP is extremely important. I'm guessing that if you start changing cache sizes (and definitely if you start including different SIMDs), you end up with subtlety different behavior in terms of how instructions are cached/processed/stored.

When it comes to the x86 architecture, Intel and AMD have actually invested a huge amount of time/money ensuring that, generally speaking, a processor from Generation X can continue to run code from Generation X-5, and will do so much more quickly than a Gen X processor could ever have done. If we take Netburst out of the picture, both companies have a nearly unbroken history of improving performance at the same clockspeed.

You would know more about this than I, but I'm betting that the middleware available for both Cell and Xenon isn't nearly as robust as Intel's x86 compiler. The result, I think, would be a comparatively "brittle" architecture. It's not that you can't get great performance out of it, it's that a handful of differences can cause thread stalls and drastically hurt performance.

(Tidbit: This last is what killed Prescott. It has the best branch predictor Intel had ever built, but the penalty for a prediction miss was so severe that even a prediction accurancy above 98% (IIRC) couldn't compensate for the horrible penalty of a stall. )

Your thoughts?

  • | Post Points: 5
Top 100 Contributor
Posts 1,076
Points 11,645
Joined: Jul 2009
Joel H replied on Sun, Aug 29 2010 6:16 PM

3vi1,

In the above I should've said "more quickly than a Gen X-5 processor could ever have done."

  • | Post Points: 5
Top 10 Contributor
Posts 5,053
Points 60,715
Joined: May 2008
Location: U.S.
Moderator
3vi1 replied on Sun, Aug 29 2010 7:13 PM

>> Neither the Xenon nor the Cell have an OoE-style front-end, which means ILP is extremely important

I see where you're going now! Due to the in-order execution, the new design might actually do some branch prediction thanks to faster register access that actually ends up having an overall negative impact.

This problem would of course not affect the emulation in next gen systems because their implementation would be fast enough for the subtle difference to be inconsequential.

Thanks for helping get that light bulb to go on over my head! Now what they did makes a lot more sense.

What part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn" don't you understand?

++++++++++++[>++++>+++++++++>+++>+<<<<-]>+++.>++++++++++.-------------.+++.>---.>--.

  • | Post Points: 5
Top 100 Contributor
Posts 1,076
Points 11,645
Joined: Jul 2009
Joel H replied on Sun, Aug 29 2010 8:06 PM

3vi1,

This is pure speculation on my part. I'm only theorizing based on what I've been told. In fact, I think it likely that console designers would like to be able to upgrade certain parts of the machine, but no one has yet marketed an idea for doing so that solves the potential problems.

  • | Post Points: 35
Top 50 Contributor
Posts 3,109
Points 38,260
Joined: Aug 2003
Location: Texas
acarzt replied on Sun, Aug 29 2010 8:58 PM

Soooooo..... when are we getting the next Generation of Consoles... cuz uh, i'm ready for it.

We need some DX11 ready consoles already :-D

  • | Post Points: 20
Top 10 Contributor
Posts 5,053
Points 60,715
Joined: May 2008
Location: U.S.
Moderator
3vi1 replied on Sun, Aug 29 2010 9:03 PM

>> I think it likely that console designers would like to be able to upgrade certain parts of the machine,

Yeah - their problem is that the market ends up fragmented.  Remember the RAM expansion cart for the Saturn... or worse, the Sega 32x?  Hardly anyone develops anything that really exploits the expansions because the base machine market is so much bigger.

I had a FrankenSega.  Genesis + SegaCD + 32x...  looked kind of like the starship Enterprise if you looked at the profile.  And were drunk.  :)

 

What part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn" don't you understand?

++++++++++++[>++++>+++++++++>+++>+<<<<-]>+++.>++++++++++.-------------.+++.>---.>--.

Top 10 Contributor
Posts 5,053
Points 60,715
Joined: May 2008
Location: U.S.
Moderator
3vi1 replied on Sun, Aug 29 2010 9:29 PM

acarzt:

We need some DX11 ready consoles already :-D

Why?  The 360 already supports tessellation.  :)

What part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn" don't you understand?

++++++++++++[>++++>+++++++++>+++>+<<<<-]>+++.>++++++++++.-------------.+++.>---.>--.

  • | Post Points: 5
Page 1 of 1 (14 items) | RSS