New Features In Maxwell
NVIDIA also took the time to incorporate support for some new features in GM204, namely Dynamic Super Resolution (DSR), Multi-Frame Sampled Anti-Aliasing (MFAA), and Voxel Global Illumination (VXGI). NVIDIA has made some enhancements to improve the user experience when using VR devices like the Oculus Rift as well.
First let’s talk about Dynamic Super Resolution, or DSR. DSR essentially gives users the ability to render their games at higher resolutions than are supported by their monitor, and then down-sample the imagery for improved visual quality. For example, you could render a game at 4K, but output it to a screen at 1080P. Rendering the game at higher resolution and then doing some sophisticated filtering when scaling it down, ultimately improves image quality and eliminates or minimizes the speckles incomplete textures commonly seen at lower resolutions.
The density at which a texture gets sampled at 1080P is lower than when using a higher resolution (obviously). At 1080P, for example, a high resolution texture gets sampled with a relatively course grid. With DSR though, because the game renders at a higher resolution like 4K, the sample grid changes and the texture gets sampled at 4K (or whatever higher resolution is being used). As a result, the entire scene benefits—including transparent textures.
At first DSR may seem like basic down-sampling or a new implementation of Super-Sampling, but it is not. DSR uses a 13 TAP Gaussian filter to clean up and enhance the scaled images, and the game actually renders frames at the higher resolution selected. We should also note that Maxwell has a separate stage to do the filtering—it is not the same scaler used in monitor hacks. Users can adjust the characteristics of the filter in the NVIDIA driver control panel to alter the sharpness or smoothness too.
Developers don’t have to do anything to leverage DSR; it is all handled in hardware. And the effects can be quite dramatic. If you look at the grass in the screenshot above, you’ll immediately see the benefits of DSR (on the right).
Dynamic Super Resolution is going to be launching on Maxwell, but will likely to come to older GPUs as well. GeForce Experience will give users the ability to use DSR, and set it up on a per-game basis. If a game doesn’t do a good job managing its UI at resolutions like 4K, DSR may not be recommended. And keep in mind, rendering a game at 4K, even if you’ve got a lower resolution display, still impacts performance.
Next up, Multi Frame Sampled AA, or MFAA...
MFAA is a new anti-aliasing mode that will be offered on the GeForce GTX 980 and GTX 970 in a future driver update. In essence, MFAA can provide the same experience as 4X MSAA at roughly the performance cost of 2X MSAA. It works by taking 2 coverage samples per pixel, but effectively doubling the coverage, by altering the sample pattern and sampling multiple frames. Mathematically, 2X MFAA is essentially the same as 4X MSAA, but half of the work is being done per frame.
In motion, 2X MFAA is effectively the same experience as 4XMSAA, but it’s about 30% faster. MFAA uses a different sample pattern per frame (it basically flips the sample points per frame) and then uses a temporal synthesis filter to combine the data from the sampled frames. It’s actually more complex than that because the algorithm considers motion and other aspects of the frame, and it looks across more than 2 frames to set the filter properties, but NVIDIA didn’t dive too much deeper into it.
If there’s extreme motion, MFAA loses its effectiveness, but jaggies are hard to see on fast moving objects anyway. With slower moving objects, the effect is much better. As we’ve mentioned, MFAA also offers a better result on transparent textures, whereas MSAA does not, because of the multi-frame samples used. MFAA is also more effective at higher frame rates. At low frame rates, it may cause flickering, however.
Which brings us to Voxel Global Illumination, or VXGI...
To help explain VXGI, we should first outline what a voxel is. A voxel is a volume element, comprised of little cubes arranged in a grid, like virtual lego bricks. The smaller the voxels, the more accurately it can represent the shape of the object. They’re fully analogous to the concept of resolution. Think about voxels as 3D pixels.
VXGI is accelerated in hardware on Maxwell and is intended to be a real-time, practical solution for games running at 40+ frames per second. There is a multi-projection acceleration clock in GM204 that improves voxelization and cube map generation performance by up to 6x versus previous-gen solutions, and also accelerate variable resolution shadow maps. The hardware replays geometry to all view ports in one pass and also has hardware support for simple per-viewport processing.
VXGI stores two quantities for each voxel—opacity and emittance. Empty voxels have an opacity value of 0. Voxels fully contained in solid objects will have a value of 1. Voxels that are partially covered will have some value in between. Emittance accounts for the amount of light reflected from the respective walls of the voxel.
VXGI works in three steps. The first step determines the set of voxels intersected by each primitive. It then computes voxel opacities (the amount of opaque material contained in each voxel) and averages the results to generate a “big picture” view of the objects. If that initial view is calculated at a high resolution, it’s easier to then lower the resolution and reduce the data set. In the slide above, red areas in the voxels are completely opaque, blue areas are the less opaque.
The second step is light injection. In this step, the amount of light reflected by the materials in the voxels is calculated. Although the “big picture” view represented in the slide is of a relatively low resolution, it does a good job of approximating the amount of light reflected in the scene.
In the third step, VXGI collects bounced light from the environment and arranges it in "cones". Think of the cones as a group of light rays striking a point. And note that a hemisphere of cones can be grouped together and traced to approximate light from any point. The number of cones used for the calculations is configurable, but normally 3 or 4 cones per pixel are traced for acceptable performance. Depending on what kind of quality of performance you’re looking for, the number of cones is scalable.
According to NVIDIA, VXGI is will be available for Unreal Engine 4 (and other major engines) starting in Q4.
NVIDIA is also working on a number of things to improve the virtual reality experience with Maxwell. NVIDIA is calling the group of features VR Direct and it includes DSR and MFAA (outlined above), software optimizations to lower latency at the OS and GPU level, support for SLI, Auto Asynchronous Warp, and Auto Stereo.
NVIDIA noted that they are working on developing a zero latency adder for SLI. AFR, or alternate frame rendering, which is typically used for multi-GPU configurations, adds a frame (or multiple frames) of latency by its very nature. The new mode—of which few details were given—will eliminate that latency.
The optimizations we mention cut about 10-40ms of latency out from the driver / OS interaction. MFAA reduces the GPU render time, further lowering the latency, and Auto Asynchronous Warp can asynchronously sample input as late as possible from a head tracker. The sum total of all of these things can take the typical 50ms of latency involved with viewing a given frame in a VR headset down to roughly 25ms.