The 55nm RV670 GPU Architecture
As we’ve already mentioned, the RV670 GPU borrows heavily from the R600 that came before it. The core technology in each GPU architecture is fundamentally very similar. But with the RV670, AMD has made a number of changes designed to boost performance in certain areas as well as enhance power efficiency. We covered the R600 architecture in-depth in our Radeon 2900 XT launch article, so we won’t go in-depth again here, but we do want to point out some pertinent information. What we have for you below are a number of slides taken from an AMD presentation explaining the key benefits of the RV670 GPU and the products built around it – the Radeon HD 3870 and Radeon HD 3850.
One of the RV670’s stand-out features is full support for DirectX 10.1. A few weeks back, news broke that Microsoft would be releasing DX10.1 with Vista SP1 in early 2008, and that the current generation of DX10 GPUs would not fully support the new features brought forth with the update. The merits of the new features inherent to the DX10.1 update are still in question, but AMD decided to support them in hardware anyway.
As for the fundamental blocks inside the GPU, RV670 doesn’t differ very much from R600. The RV670 still has 320 stream processing units, 16 texture units, and 16 ROPs. The RV670, however, does away with the 1024-bit internal ring-bus memory controller in favor of a 512-bit variant. And its memory interface has also been pared down from 512-bits to 256-bit. From high-level perspective, these changes sound-like downgrades, but other tweaks to the GPU negate a massive loss of internal and external bandwidth. While the R600 uses 8, 64-bit memory channels, the RV670 doesn’t simply use 4, 64-bit channels. Instead the new GPU uses 8, 32-bit channels to improve the granularity of the data transfer, which in turn improves relative efficiency of the RV670’s memory bus. There have also been other tweaks, such as improved memory arbitration logic and higher memory clocks on HD 3870 in particular. And as you’ll see a little later, the net performance profile between R600 and RV670 is very competitive.
Overall improvements have also been made with RV670. Since the Radeon HD 2900 XT was out in the wild before RV670 was back from the fab, AMD has time to better evaluate the performance characteristics of the R600 and the bottlenecks of the architecture. Much of what the company learned from this allowed AMD to tweak the RV670’s design in order to minimize the bottlenecks as much as possible. Another area of focus with the RV670 was improving the handling of latency bound tasks to schedule work more effectively. The net result of the improvements is that despite halving the memory-bus width, the RV670 is sometimes faster than HD 2900.
Other changes to the RV670 include a more advanced manufacturing process, the introduction of PowerPlay on the desktop, and support for PCI Express 2.0 and CrossFireX. Moving to TSMC’s advanced 55nm manufacturing process to build the 666M transistor RV670 results in nearly twice the transistor density and a much smaller die than R600. In fact, the RV670 is less than half the size of the R600 192mm2 versus 408mm2.
The new GPU also takes advantage of ATI’s PowerPlay technology which down-clocks or disables parts of the GPU when not in use, or when maximum performance is not necessary. The combination of PowerPlay and the new manufacturing process drastically reduces the RV670’s power requirements, especially in comparison to the R600.
The RV670 also supports the PCI Express 2.0 standard which doubles the bandwidth of the GPU’s serial link interface when used with a compatible chipset. With CrossFireX, users are able to link two, three, or even four cards together to increase performance. To utilize three and four card CrossFireX configuration, you’ll need a motherboard with the correct slot configuration and proper drivers. Motherboards with the necessary PEG slots are already available. The drivers, however, won’t arrive until later this year or early in 2008.