Memory Controller and Tech Demos
ATi took an entirely new approach to the Memory Controller architecture in the Radeon X1000 series GPUs, a hybrid approach of sorts. Traditional Cross Bar Switch memory controller architectures have inherent latencies associated with them due to multiple simultaneous service requests in different areas of memory space. Access to a specific chunk of memory at any given time may be delayed based on switching dependencies of a crossbar architecture.
Essentially, what we've learned here is that there are two switching resources now for reads and writes on board X1800 and X1600 memory controllers. The X1800 has a 512-bit internal "ring bus" architecture which then maps out to a 256-bit 8 channel memory interface. The X1600 has 256-bit "ring bus" architecture which then maps out to a 128-bit 4 channel memory interface. The ring bus however is only utilized for latency sensitive memory read requests while memory writes must travel through the internal cross bar switch. Regardless, read latency is significantly reduced with the bi-directional ring bus, which has direct access to the memory interface. A final side benefit of the ring bus architecture is that it significantly simplifies trace routing and layout in board designs and theoretically this should translate to a cost benefit in the end product.
In addition ATi claims to have beefed up the cache resources inside the X1000 GPU architecture, such that caches are now fully associative for texture, color and Z/stencil operations. Fully associative cache has the best percentage likelihood for a cache hit because any line in the cache can hold any address that needs to be cached. However significantly more complex control logic must be employed for this type of design since it also inherently suffers from much more strenuous search requirements over direct mapped cache, since a given address can be stored in any one of tens of thousands of cache lines and thus you have to know where to look for it. Typically it takes much more exotic search algorithms to manage fully associative caches and more control logic which also translates to die real-estate. However the net result is that caching efficiency with this architecture is significantly better at a small sacrifice of search speed. In fact ATi claims that performance expensive cache misses are reduced by as much as 30+% versus the Radeon X850's architecture, in games like Battlefield 2, Far Cry and Half Life 2.
Another approach to memory access efficiency that ATi took was that the number of channels on the Radeon X1800's 256-bit memory controller interface have been divided up into 8, 32-bit channels for better granularity on memory access. Since GDDR3 DRAM typically has a 2M x 32-bit x 8 or 4 bank organization, this translates to a 1 to 1 mapping of memory controller channels to DRAM chips. In kind, the 128-bit, 4 channel interface on the Radeon X1600 and X1300 map in one to one as well.
So now that your brain is feeling a bit spongy from all that techno-chatter, we'll let you relax a bit before we start twistin' your melon again discussing the X1000 series video pipeline, which is up next. For now, feast your eyes on the candy that ATi bestowed upon us at their Editor's Day in September.
ATi Parthenon Demo:
The base artwork for ATi's Parthenon demo was shot on location in Greece and is in fact a 3D rendered re-creation of the real deal. The geometry of this demo comes from laser scans of the actual Parthenon and consists of over 90 million polygons. This demo shows off a progressive level of detail algorithm that ATi developed that allows surface texture details to blend into view naturally as the camera perspective changes on a given scene. The result is that there was absolutely no popping visible when this the camera pans around this model and all the different surfaces are exposed. ATi's pals at Crytek should take note of this technique and figure out a way to keep those palm trees in Far Cry from popping in and out of view, regardless of draw distance detail that is selected. As with many things in life, we're sure this is easier said than done of course.
Ruby is back:
If there's one development effort that ATi definitely has NVIDIA beat on, hands down, that would have to be the art of the tech demo. This year Ruby was back and looking sexier and even more bad-ass than ever. We're not just talking about dancing pixies or friendly Biker Dudes, Ruby is actually a short-take movie with a story line and a rendering engine to die for. Too bad you can't play it but then again, tech demos can be over-the-top like this because they don't have to perform like a game.
The Gloomy but oh-so pretty "Toy Shop":
Finally, what probably impressed us most was ATi's Toy Shop demo, which makes heavy use of parallax occlusion mapping to render images in things like the brick-work and cobblestone streets. It was amazing to note that the highly detailed and realistic cobblestones in the street area, were actually made up of only 2 polygons. The rest of the impressive 3D surfaces area of the stones was all done in a parallax occlusion mapping effect. On a side note, there are over 700 unique shaders used in this demo, as well as dynamic soft shadows, volumetric lighting in the rain and fog, misty halos and glow effects.