Unified Shaders, DX10, and SM 4.0
One of the G80 GPU's main benefits is its full support for DirectX 10, Shader Model 4.0, and the other features inherent to Microsoft's upcoming API. In addition to the increased performance offered by the G80's architecture, DirectX 10 itself is poised to offer major performance benefits over DirectX 9. It will do this by significantly reducing the CPU overhead required for rendering. DirectX 10 addresses DX9's CPU overhead problems in a number of ways. For example, the cost of draw calls and state changes are reduced through a complete redesign of the performance-critical parts of the core API. Also, new features have been introduced to reduce CPU dependence and to allow more work to be done in one command.
With DirectX 10 Microsoft will also be introducing Shader Model 4.0, which incorporates many key innovations like a new programmable stage called the geometry shader that allows for per-primitive manipulation. DX10 will also provide a new unified shading architecture with a unified instruction set and common resources across vertex, geometry, and pixel shaders. DX10 specifications are also more well defined, so you won't see DirectX 10 class hardware that lacks key features. Some DX9 class GPUs were labeled as DirectX 9 compliant when they did not support all DX9 features, like vertex texture fetch for example.
Geometry shaders in particular represent a major step forward in the programmable graphics pipeline. In fact, the introduction of Geometry shaders marks the first major change to the 3D graphics pipeline in many years. Geometry shaders allow for the generation and destruction of geometry data on the GPU for the first time. Previously, GPU's could only manipulate existing geometry. Coupled with the new stream output function, algorithms that weren't previously possible, or that had to be executed on the host CPU, can now be mapped to the GPU.
Stream output is another useful new DirectX 10 feature supported in GeForce 8800 GPUs that enables data generated from geometry shaders (or vertex shaders if geometry shaders are not used) to be sent to memory buffers and subsequently forwarded back into the top of the GPU pipeline to be processed again. Allowing data to flow through the GPU this way allows for more complex geometry processing, advanced lighting calculations, and GPU-based physical simulations without heavily taxing the host CPU.
Shader model 4.0 also provides an increase in the resources allotted for shader programs. In previous versions of DirectX, developers had to manage relatively scarce register resources. DirectX 10, however, provides large increase in register resources. As you can see in the chart above, temporary registers are up from 32 to 4096, and constant registers up from 256 to 65,536 (sixteen constant buffers of 4096 registers). Textures, texture sizes, and the number of render targets has increased as well. The GeForce 8800 architecture can provide all of these DirectX 10 resources.
In prior versions of DirectX, pixel shaders lagged behind vertex shaders in the number of constant registers, available instructions, and instruction limits. Due to these limitations, developers looked at vertex and pixel shaders as separate entities. But with Shader model 4.0's unified instruction set with the same number of registers and inputs for both pixel and vertex shaders, all shaders will be able to tap into the entire resources of the GPU.
Workload with Discreet Pixel and Vertex Shaders
The GeForce 8800's unified architecture also results in a more efficient use of GPU resources. With the previous generation of GPUs that had discreet pixel and vertex shaders, there would almost always be idle hardware. If a scene was particularly pixel shader heavy, for example, the vertex shaders may have sat idle, and vice versa for the opposite scenario.
Workload with Unified Shaders
But with a Unified shader architecture, because GPU resources can be allocated on the fly and dynamically load-balanced, major portions of the GPU won't sit idle waiting for a shift in the workload. At its most basic level, a unified shader architecture makes rendering more efficient.
New HDR Modes:
With DirectX 10, Microsoft will also introducing two new HDR formats which offer the same dynamic range as FP16 but require only half the storage. The first format, R11G11B10, is optimized for storing textures in floating-point format. It uses 11-bits for red and green, 10-bits for blue. The second floating point format is designed to be used as a render target. It uses a 5-bit shared exponent across all colors with 9-bits of mantissas for each component. These new formats will allow high-dynamic range rendering with less costly storage and bandwidth requirements. For the highest level of precision, DirectX 10 supports 32-bits of data per component. The Geforce 8800 series fully supports this feature, which can be used for anything from high-precision rendering to scientific computing applications.
DirectX 10 will also offer new instancing capabilities. With DirectX 9 instanced objects were basically copies of the original; they could not use different textures or shaders. But with DirectX 10 instanced objects no longer need to use the same textures. Due to the addition of texture arrays, each instanced object can now reference a unique texture by indexing into a texture array. Instanced objects can also use different shaders through the use of HSLS 10's support for switch statements. What this means is that a shader can be written that describes multiple different materials, and during rendering, each instanced object could have unique effects applied to it.