The new flagship in ATI's line-up is the GPU formerly codenamed R600, found at the heart of the Radeon HD 2900 XT. Like NVIDIA's G80 and its derivatives, the R600 has a unified architecture that replaces specialized pixel and vertex shaders with an array of stream processors that can dynamically allocated to handle pixel or vertex shader workloads, in addition to geometry shaders, physics, or any number of things.
This high-level block diagram will give you a bird's eye view of what the R600 has under its hood. The GPU is comprised of approximately 700+ million transistors and is manufactured on an enhanced version of TSMC's 80nm node, dubbed 80 HS, that, according to ATI, allowed them to crank up the R600's frequency to levels they couldn't hit with TSMC's standard 80nm process.
On some levels, the R600 borrows technology from the Radeon X1000 series and the Xenos GPU found in the Xbox360, but there is plenty of new technology employed in this GPU as well. The R600 has a new command processor that processes command streams from the graphics driver and can reduce overhaed by as much as 30%. There is also a new setup engine that more efficiently prepares data for processing by the stream processing units. In addition, there are also 320 Stream Processing Units, beefed up texture units and render back-ends, or ROPs if you prefer.
The 320 individual stream processing units in R600 are arranged in 4 groups of 80 SIMD arrays and each functional unit is arranged as a 5-way superscalar shader processor. In contrast, NVIDIA's G80 has up to 8 groups of 16 (128 total) fully generalized, fully decoupled, scalar, stream processors, but keep in mind the SPs in G80 run in a separate domain and can be clocked as high as 1.5GHz. In ATI's R600, each functional SP unit can handle 5 scalar floating point MAD instructions per clock. And one of the five shader processors (the fatter one in the image above) can also handle transcendentals as well. In each shader processor, there is also a branch execution unit that handles flow control and conditional operations and a number of general purpose registers to store input data, temporary values, and output data.
The ring-bus memory controller introduced with the X1000 series of GPUs returns in the Radeon HD 2000 series, but in the high-end model used on the Radeon HD 2900 the internal ring bus width has been increased to 1-kilobit. If you remember, the Radeon X1800 and X1900 families of GPUs were outfitted with 512-bit internal ring-bus memory controllers.
Externally, the Radeon HD 2900 XT features a 512-bit memory interface comprised of eight, 64-bit memory channels. We'll talk more about the actual card a little later, but what we will say now is that in its stock configuration with 512MB of GDDR3 RAM running at 800MHz (1.6GHz DDR), the Radeon HD 2900 XT has 106GB/s of memory bandwidth at its disposal. That's a lot of bits.