Architectural Overview (Cont.)
Like ATI's previous flagship Radeon X1950 XTX, the new Radeon HD 2900 HD is equipped with 16 total texture units and 16 ROPs. Both the texture units and ROPs have been enhanced over the last generation to increase performance and precision. The Radeon HD 2900 XT has 4 groups of 4 texture units. Each group has 8 Texture Address Processors each (32 total), and 20 Texture Samplers each (80 total) that can fetch a single data value per clock. Each group also has 4 FP Texture Filter Units a piece (16 total) that can bilinear filter one 64-bit color value per clock, or one 128-bit color value in 2 clocks. We should also note that the Radeon HD 2600 and HD 2400-based products have texture units with the very same functionality - they just have less of them.
The Radeon HD 2900 and 2600 series' texture units feature a new multi-level texture cache design as well. The units' shared L2 cache stores data retrieved on L1 cache misses (256kB in the HD 2900, 128kB in the HD2600). The Radeon HD 2400, however, is equipped with only a single level vertex / texture cache. We should also note that all texture units can access both vertex cache and L1 texture cache as well.
The texture units in the Radeon HD 2000 series can bilinear filter 64-bit HDR textures at full speed (~7x faster than Radeon X1000 series), while 128-bit floating point textures are filtered at half speed. Trilinear and anisotropic filtering is supported for all formats and the high quality anisotropic filtering mode that returns from the X1000 series has been enhanced to better handle problematic texture filtering cases. Performance and compatibility have been improved to the point that the high quality aniso mode is now the default setting.
There is also a new shared exponent texture format available (RGBE 9:9:9:5) for 32-bit HDR and texture resolution of up to 67 megatexels (8192 x 8192) are supported. ATI's new texture units can perform up to two texture fetches per clock, per texture unit (1 filtered + 1 unfiltered), with the option to grab 4 unfiltered fetches in place of 1 filtered fetch (Fetch4).
The Render Back-Ends, or ROPs, used in the Radeon HD 2000 series can handle 32 pixels per clock on the HD 2900 XT. On the Radeon HD 2600 and HD 2400 they can handle 8. The ROPs can render-to-texture more efficiently than previous ATI GPUs and new MSAA resolve functionality makes Custom Filter AA, or CFAA, possible. The ROPs also allow for new 128-bit FP and 11:11:10 FP DX10 formats to be displayable, they support up to 8 MRTs, which is double that the Radeon X1000 series, and they have improved stencil and Z compression (up to 16:1 in standard mode / 128:1 with 8X MSAA) and an improved hierchical Z buffer.
Another feature found in the Radeon HD 2000 series is borrowed from the Xbox360's Xenos GPU. Like Xenos, the HD 2000 series has built-in hardware support for tessellation. Tessellation works by taking a basic polygon mesh and recursively applying a subdivision rule to create a more complex mesh on the fly. It's best used for amplification of animation data, morph targets, or deformation models. And it gives developers the ability to provide data to the GPU at coarser resolution. This saves artists the time it would normally take to create more complex polygonal meshes and reduced the data's memory footprint. Please note, however, that the HD 2000 series' tessellator functionality is proprietary and requires developers to code for it specifically. It is already used in some Xbox360 titles though, like Viva Piñata for example, so developers may be more inclined to use this feature than some other proprietary ones.