NVIDIA GeForce RTX 30-Series: Under The Hood Of Ampere
GeForce RTX 3090, RTX 3080 And RTX 3070: Here's What Makes These New NVIDIA GPUs So Powerful
NVIDIA held some virtual briefings yesterday to provide some additional color to CEO Jensen Huang’s GeForce RTX 30 series event from earlier this week. There weren’t many significant new announcements made, but a number of the slides Jensen presented were fleshed-out a bit more, with additional details and a few new visuals revealed, that paint a more complete picture of Ampere and the GeForce RTX 3090, GeForce RTX 3080, and RTX 3070.
If you haven’t already read our coverage of the launch event, we strongly suggest checking that out. There overview provided will lay the foundation from some of the deeper details presented here – and the pictures of the cards and Jensen’s kitchen are cool as heck...
A Closer Look At NVIDIA's Ampere GPUs
During yesterday’s briefing, a look at Ampere’s new Streaming Multiprocessor configuration was shown. The biggest takeaway here is that the new Ampere SM is bigger, beefier and has a new datapath that effectively doubles compute performance. Previous-gen architecture (before Turing), only had one datapath. Turing did have a second math datapath -- one for floating point, one for integer, however. With Ampere, on the other hand, the Integer path has been augmented with an additional FP32 unit, so FP heavy workloads have significantly more resources available.
The new Ampere SM unit also doubles the L1 bandwidth and cache partition size and adds 33% more L1 capacity. In addition, Ampere’s second-generation RT (ray tracing) cores can process triangle intersection rates at twice the speed and its third-gen Tensor cores double up math performance for sparse matrices (a matrix in which most of the elements are zero).
Doubling Ampere’s triangle intersection rates should have a dramatic effect on performance for ray tracing workloads. What NVIDIA found analyzing Turing’s performance characteristics, is that it often had good Bounding Box intersection rates, but Triangle Intersection rates were holding things back. With Ampere, NVIDIA wanted to be able to process Bounding Box and Triangle intersection rates in parallel. So, Ampere’s separate Bounding Box and Triangle resources can run in parallel, and as mentioned, Triangle Intersection rates are twice as fast.
A new Triangle Position Interpolation unit has also been added to Ampere to help create more accurate motion blur effects.
NVIDIA Ampere Power And Acoustics
During Jensen’s keynote, he showed a slide explaining Ampere’s 1.9X performance per watt improvement versus Turing. Some additional details on how that was achieved, along with actual thermal and acoustic data were provided.
With previous-gen architectures, NVIDIA had one common power rail for both the cores and memory system. That meant if the cores waned to run at high voltage, the memory had to as well. With Ampere though, NVIDIA has split the core and memory power rails into separate domains, so they can operate independently. That will allow for greater efficiency and energy conservation, and ultimately improve power and thermal characteristics of Ampere cards.
Speaking of better power and acoustics, NVIDIA provided some additional color there as well. According to NVIDIA's internal testing, at any given noise level, the upcoming GeForce RTX 3080 thermal solution can keep the GPU running about 20°C cooler than the reference design on a GeForce RTX 2080. In addition, at any given temperature, the GeForce RTX 3080 will operate 10bBA quieter
A similar comparison was made between the beastly GeForce RTX 3090 and GeForce RTX Titan, but the differences were more stark. In similar testing, the GeForce RTX 3090 ran 30°C cooler than the Titan RTX, while being about 20dBA quieter at any given temperature along the curve.
The improved thermal and acoustic performance of GeForce RTX 30 series cards is enabled by a newly-designed cooling solution. The coolers feature dual axial fans, and a split heatsink design that is quieter that current solutions, while offering the ability to dissipate up to 90 more watts of power. The front-end of the heatsink array sits directly atop the GPU and memory. A fan above directs air through the heatsink and directly out of the chassis. The heatsink on the back half of the card, however, which is linked to the large vapor chamber via heat-pipes allows air from the second fan to pass all the way through, where it is rises to the top of the chassis and is eventually exhausted.
The more capable coolers on the GeForce RTX 30 series accompanies denser PCB designs, with a miniaturized 12-pin power connector on some cards as well. In addition to the unique V-shaped rear edges on the PCBs, they are also packed tighter and are reportedly 50% denser. You can get a good look at the GeForce RTX 3080's unique PCB design above.
NVIDIA Ampere: Boosting Performance In A Few Ways
A demo to show the theoretical benefits of NVIDIA RTX IO, that works in conjunction with Microsoft's DirectStorage API, was also shown. During the demo, handling the level load and decompression took about 4X as long on a PCIe Gen 4 SSD using current methods and used significantly more CPU core resources. The demo was run on a 24-core Threadripper system and the standard load / decompress took over 5 seconds. With RTX IO, that time was cut to just 1.61 seconds. We won’t even talk about the hard drive’s performance here. Ouch – it hurts just to look at the chart.
NVIDIA also made some direct comparisons between Turing and Ampere running some Ray Tracing tests with Wolfenstein Youngblood at 4K. As you can see, by the data provided, the Ampere GPU is able to churn through frames (or at least, the frame provided here for reference), significantly faster than the equivalent Turing-based GPU.
Of course, some framerate comparisons were made as well. First up, let's see what the GeForce RTX 3070 can do...
In the chart above, the GeForce RTX 3070 is being compared to a GeForce RTX 2070 and GTX 1070, at 1440p. In Borderlands 3, using traditional rasterization, the GeForce RTX 3070 breaks the 80FPS mark, whereas the RTX 2070 land around 50FPS, and the GTX 1070 can't quite hit 40 FPS. Turn RTX On with Minecraft and Control and the GeForce GTX 1070's performance craters, but the GeForce RTX 3070 maintains similarly large advantages over the RTX 2070.
The GeForce RTX 3080 performance comparisons used the same games, but the resolution is upped to 4K and the RTX 2080 and GTX 1080 are rolled in for reference. Here we see similar trends, with the GeForce RTX 3080 hitting framerates in the 60 - 80 FPS range, maintaining large leads over the previous-gen cards.
NVIDIA's GeForce RTX 3090 performance comparisons were made versus the GeForce RTX Titan, an while more data points were provided (including games, rendering, and compute workloads), they don't quite offer as much clarity in terms of actual framerates. Regardless, NVIDIA's data shows the GeForce RTX 3090 blowing past the RTX Titan across the board. Tensor core performance in particular shows massive performance gains.
We'll have more information to share as we get closer to the GeForce RTX 30-series' release. For now, these additional details will have to hold you over. Although peak power is higher with NVIDIA's upcoming Ampere-based GeForce RTX 30-series cards, improvements in efficiency, performance, packaging, and cooling paint a compelling picture of what to expect, and we're eager to test things out for ourselves. Thankfully, it won't be much longer now.