Pixel Pipeline and Vertex Shader Details
Dropping down another level into the Shader and Vertex Engines of the R420 series VPUs, we see that ATi didn't just bolt on more shader units but went a step beyond by enhancing their capabilities as well.
A high level view of an R420 Quad Pixel Shader unit, shows us the complex array of logic functions and memory resources that comprise the engine:
The shader enhancements ATi made to the X800 VPU are the following:
Increased temporary registers to 32, up from 12 and added facing register
Increased maximum shader instruction length to 1,536, up from 160
Added support for 3Dc 4:1 Compression of 2-component data for Normal Maps
All told, these enhancements provide for increased throughput and efficiency in the shader engine, as well as more complex shader instructions that can be handled. If there is a limitation to the X800 series of VPUs, this is where ATi has come up a bit short. By now, you're probably aware that NVIDIA's NV40 series have full support for PS3.0 and its 65K shader instruction length requirement in DX 9.0c. Also PS3.0 requires full 32 bit shader precision throughout the pipeline, while the X800 is still only 24 bit capable. This is a notable missing check box item for the R420, there is no question about it. However, it is still unclear whether Game Developers will begin to utilize PS3.0 effects and performance enhancements, before ATi has readied their next generation VPU. This remains to be seen but obviously NVIDIA is out evangelizing PS3.0 as the best path for developers
Regardless, there are currently no PS3.0 supported games on the market, with the possible exception of a patch for FarCry that will add the capability in the near future. We'll just have to wait and see if lack of PS3.0 support will come back to haunt ATi but if you look back at how long it has taken ISVs to develop DX9/PS2.0 effects, you could place your bets potentially that the R420 will hold up well in the near term, possibly until ATi's next product launch.
The above slide capture from ATi's launch presentation pretty much sums up the story on their new enhanced Vertex Shader Engine. It's now a 6 vertex pipeline machine, that can generate up to 12 vertex shader operations per clock cycle and offers potentially 2X the performance of the Radeon 9800XT. In addition to dropping in 2 more Vertex Shader pipelines, ATi also enhanced the Vertex shader units again focusing on alleviating bottlenecks and stalls as a result of resource contentions in the Vertex shader FPUs.