NVIDIA GeForce GTX 1080 Performance Review: Pascal, The New King
New Features: Ansel, SMP, Fast Sync, SLI Update
AnselOne of the new features NVIDIA introduced at the GeForce GTX 1080 unveiling is something called Ansel. Ansel is an in-game photography tool that gives users the ability to capture some incredible images, that wouldn’t be possible with simple screenshots.
Up until now, PC gamers have been limited to rudimentary tools for capturing in-game screenshots. When you've maneuvered your avatar or perspective to precisely the right spot, tapping the Print Screen button or F12 when using something like Steam's overlay captures what’s in the frame buffer, which you can then output to your image editor of choice, but that’s about it.
Ansel takes things considerably further. Ansel essentially acts like an in-game 3D VR camera system. When you activate Ansel, the game is paused while you frame the perfect shot. You can pan around the scene in 360 degrees, roll, zoom, and re-position your angle. There are also filters to play with and adjustments that can be tuned, such as brightness, field of view, etc. You can even create and share your own special FX filters.
Once you have the perfect composition, you can blow the image up to ultra-high resolutions up to 4.5 gigapixels (61,440 x 34,560). These huge images allow viewers to zoom in on details, with far less blurring and artifacts. They are created using up 3,600 stitched tiles captured at full resolution.
Ansel treats games captures as art, which of course they are. Professional screenshots are a thing, as game photographers such as Duncan Harris and Leonardo Sang can attest. Ansel requires a game to incorporate support for the technology, but it is already coming to games like The Witcher III: Wild Hunt, Tom Clancy's The Division, Unreal Tournament, and others.
Simultaneous Multi-ProjectionSimultaneous MultiProjection (SMP) is a new technology being introduced with Pascal, that gives the GPU the ability to render more efficiently on modern displays. SMP allows Pascal to project up to 16 different viewports, each with the ability to rotate in order to better match the display configuration. SMP also has the ability to project more than one viewport simultaneously.
SMP is used for multiple new graphics techniques such as Perspective Surround, Lens Matched Shading, Single Pass Stereo, and MultiRes Shading. When used for virtual reality applications, for example, SMP can result in a 50% increase in pixel throughput and a doubling of geometry throughput versus previous-gen cards.
Perspective Surround leverages SMP to render a wider field of view with the correct image perspective across all three monitors, and in a single geometry pass.
SMP is also leveraged in NVIDIA's new VRWorks feature that reduces the geometry work required for VR rendering. With Pascal’s Single Pass Stereo, the GP104 GPU processes the geometry only once, and performs transforms to generate views for both eyes. This reduces the GPU’s geometry workload by half, and ultimately improves performance.
Another new feature in VRWorks leverages technology from NVIDIA's iRay physically based rendering tech to accurately produce audio in VR environments, with proper reflections, echo, etc.
Asynchronous ComputePascal also brings support for Asynchronous Compute with dynamic load balancing, pixel-level graphics preemption, and instruction-level compute preemption. There are many potential benefits to Asynchronous Compute with modern games.
Because many serial workloads can be broken up and parallelized, certain GPU resources, that would normally remain idle for relatively long periods of time while the GPU crunched serial workloads, can be more fully utilized.
The end result is that the GPU can execute workloads faster and generally improve performance, both in terms of latency and frame rate. And improving the effective performance of a GPU could enable developers to use additional graphics effects to ultimately enhance image quality, while still hitting a specific performance target.
Lower latency is an obvious plus for VR applications because it can make the user feel more connected to the scene, by minimizing the lag associated with accurate head-tracking. With Asynchronous Compute, for example, Pascal can inject the proper frame during an asynchronous time warp later, while still completing the task before the VR display is refreshed.
New Memory CompressionPascal also features an enhanced lossless memory compression scheme. Employing memory compression reduces the amount of data written out to memory and the amount of data transferred to the L2 cache and between blocks, like the texture units and frame buffer.
The GPU’s compression pipeline uses a number of different algorithms to determine the most efficient way to compress the data. One of them is delta color compression. With delta color compression, the GPU calculates the differences between pixels in a block and stores the block as a set of reference pixels plus the delta values from the reference. If the deltas are small then only a few bits per pixel are needed. If the overall result of reference and delta values is less than half the uncompressed storage size, then delta color compression succeeds and the data is stored at half size (2:1 compression). That’s how Maxwell worked too.
The GP104 GPU also includes an enhanced delta color compression capability, though. This 2:1 compression method has been enhanced to be effective more often and a new 4:1 delta color compression mode has been added to cover cases where the per pixel deltas are very small and can be packed into .25 of the original space. Finally, a new 8:1 delta color compression mode combines 4:1 constant color compression of 2x2 pixel blocks with 2:1 compression of the deltas between those blocks.
The end result is that the GP104 is able to significantly reduce the number of bytes that have to be fetched from memory per frame. The example from Project Cars posted above shows how much of a frame can be compressed in Pascal vs. Maxwell.
Fast SyncFast Sync is a new V-Sync related technology designed to address the tearing issues that occur when a game's frame rates far exceed the refresh rate of the monitor.
Fast Sync allows for high frame rates and low latency and fast response times, without tearing by implementing multiple buffers and flip logic. The GPU is allowed to render frames as fast as it can -- similar to having V-Sync disabled, but the best completed frame is sent to the display in time with the display refresh. You end up with only slightly higher latency than having V-Sync completely off, but without any tearing.
FastSync is not a replacement for G-SYNC, but rather complements it. G-SYNC is most effective when framerates in games drop below what would normally be considered playable levels. At very high framerates, the benefits of G-SYNC are less noticeable. Using FastSync with G-SYNC is essentially the best of both worlds – smoother on-screen animation at lower framerates, and tear-free, low-latency gaming at high framerates.
HDR SupportThe GeForce GTX 1080 display pipeline also supports HDR gaming and video encoding / decoding.
The Pascal architecture supports HDR Video (4K@60 10/12b HEVC Decode), HDR Record/Stream (4K@60 10b HEVC Encode), and has HDR Interface Support (DP 1.4). And because of the HEVC 10b encoding built into the GP104 GPU, and the HEVC 10b decoding built into the SHIELD Android TV device, you can stream HDR games and content to an HDR compatible TV.
SLI UpdatesNVIDIA is also making a few updates to SLI with Pascal. With the GeForce GTX 1080 (and 1070), the two SLI interfaces on the top of the card are now linked together to improve inter-GPU bandwidth. Doing so allows both SLI interfaces to be used in tandem for increased bandwidth, which has frametime / frame pacing benefits for high-resolutions displays or multi-display surround setups with ultra-high effective resolutions.
With this move, NVIDIA is focusing their efforts on dual-card SLI only. And new SLI high-bandwidth (HB) bridge connectors are being introduced (with various lengths to accommodate different slot configurations), that can operate at higher clocks. The GeForce GTX 1080 is also compatible with legacy SLI bridges, but bandwidth will be limited to the maximum speed of the bridge being used.
When using a new SLI HB Bridge, the GeForce GTX 1080’s SLI interface runs at 650 MHz, compared to 400 MHz in previous GeForce GPUs using legacy SLI bridges. Some older SLI Bridges will also get a speed boost when used with Pascal GPUs, though. Custom bridges that include LED lighting will now operate at up to 650MHz when used with the GTX 1080.
Some people have reported that only 2-way SLI is supported with the 1080. However, that’s only partially the case. Two-way SLI is NVIDIA’s focus moving forward, but enthusiasts that want to enable 3 or 4 way SLI will be able to do so using a utility that unlocks the capabilities. NVIDIA is setting up an “Enthusiast Key” website where users will be able to generate a key to unlock multi-card SLI.
Part of the reason for this move is because performance doesn't always scale linearly with more than two GPUs. And with newer APIs like Direct X 12, multi-GPU support is handled differently -- alternate frame rendering won't always work with many games, for example.