NVIDIA GeForce RTX 3090 Review: BFGPU Benchmarks Unleashed
GeForce RTX 3090: NVIDIA's BFGPU Has Arrived And It Slays
Whatever you want to call the GeForce RTX 3090, one thing is for certain. As of this moment, the GeForce RTX 3090 is the single most powerful graphics card money can (almost) buy. It sits at the pinnacle of NVIDIA’s product stack currently, and according to the company, it enables things like smooth 8K gaming and seamless processing of massive content creation workloads, thanks in part to its 24GB of on-board GDDR6X memory.
A graphics card like the GeForce RTX 3090 isn’t for everyone, however. Though its asking price is about a $1,000 lower than its previous-gen, Turing-based Titan RTX counterpart, it is still out of reach for most users. And the GeForce RTX 3090’s performance characteristics will likely make its value proposition interesting to only a select group of enthusiasts and creators. We’ll do out best to better explain all of that on the pages ahead. For now, let’s take a look at the specs and inspect this big, beautiful beast...
|
Though NVIDIA has branded this card the GeForce RTX 3090, it is essentially the replacement for the previous-gen Titan RTX. As such, it isn’t purely a gaming-focused GPU. According to NVIDIA, demand for the various Titans was higher than anticipated, so with this generation, in addition to selling them directly, NVIDIA is working with board partners to go wider with availability, so they will be offering GeForce RTX 3090 series cards as well.
Before we dive any deeper into the speeds and feeds though, we need to direct your attention to a few previous articles. We have already covered much of the underlying technology at the heart of the GeForce RTX 3090, so we won’t be doing so again here. If you want some of the backstory, however, we recommend checking out our coverage of NVIDIA’s initial GeForce RTX 30 series announcement, the deeper dive on its new features and the Ampere architecture, and last week’s GeForce RTX 3080 reviews. Once you’ve got that all digested, you’ll understand much of what the GeForce RTX 3090 is all about.
NVIDIA GeForce RTX 3090 Speed And Feeds
As you can see in the detailed spec breakdown and comparison above, the new GA102-powered GeForce RTX 3090 is amped-up and more capable than the previous-gen Titan RTX in almost every way, except for two. The GeForce RTX 3090 has a lower default boost clock and fewer Tensor cores. The GA102’s newer architecture and additional resources more than compensate for the lower default boost frequency though, and Ampere’s 3rd-generation Tensor cores more than double the throughput of the previous-generation, in addition to supporting additional types of math, like BLfoat16 (BF16) and TensorFloat-32 (TF32). In regards to pixel and texture fillrate, memory bandwidth, and compute performance, the GeForce RTX 3090 is significantly more powerful than the Turing-based Titan RTX, or anything else for that matter.The GA102 GPU has a die size of 628.4mm2 and is comprised of roughly 28 billion transistors. The chips are also manufactured on a newer, custom, Samsung 8nm process (8N) than their previous-gen, Turing-based counterparts, which used a 12nm FinFET process a TSMC. Notice, that despite packing nearly 10 billion more transistors, the GA102’s die size is about 126mm2 smaller, thanks to that more advanced process.
As we’ve mentioned in our previous GeForce RTX 30 series and Ampere coverage, all of those additional transistors were used to enable new features, like PCIe Gen 4 support, and enhance Ampere’s performance for virtually all GPU-bound workloads. Pre-Turing, NVIDIA’s GPU architectures had only one data path, for example. A second one was added with Turing, though -- one for floating point and, one for integer. And with Ampere that second Integer path has been beefed up with an additional FP32 unit, so floating point heavy workloads have much more horsepower at their disposal.
Ampere’s 2nd-generation RT (ray tracing) cores have also been optimized for better performance. The 82 RT cores in the GeForce RTX 3090 (up from 72 in the Titan RTX) offer up to 35.6 TFLOPS of compute performance across multiple precision levels (vs. 16.3 – 32.6 TFLOPS on Turing) and the 3rd-gen Tensor cores offer up to 284 TFLOPS of Int8 performance, versus 261 on the Titan RTX (double those numbers for Int4). We should also point out that the 2nd-gen RT cores offer 2x the triangle intersection rate of Turing and those 3rd-gen Tensor cores double up math performance for sparse matrices, e.g. matrices in which most of the elements are zero.
The NVIDIA GA102’s SM (Streaming Multiprocessor) configuration has also been completely revamped. Ampere’s new SMs double the L1 bandwidth and cache partition size and add 33% more L1 capacity, for up to 10,496KB on the GeForce RTX 3090.
NVIDIA found that Turing often had good Bounding Box intersection rates, but Triangle Intersection rates were a limiting factor with some workloads, so Ampere got some attention in that regard as well. Ampere can now process Bounding Box and Triangle intersection rates in parallel to improve efficiency and performance, and thanks to the additional GPU resources available, Triangle Intersection rates are approximately twice as fast now too. A new Triangle Position Interpolation unit has also been added, which will enable more accurate motion blur effects in future RTX-enabled applications.
Bleeding-Edge Memory And Cooling Tech
Like the GeForce RTX 3080, the GeForce RTX 3090 is outfitted with Micron’s latest GDDR6X memory technology (the upcoming GeForce RTX 3070 will use standard GDDR6), which offers much higher bandwidth. GDDR6X leverages 4-level PAM4 signaling that can transmit twice as much data per clock, effectively doubling bandwidth per tick. The first wave of flagship Ampere-based GeForces will employ GDDR6X memory with data rates up to 19.5Gbps. On the GeForce RTX 3090 specifically, which features 24GB of on-board memory, linked to the GPU via a 384-bit memory interface, that equates to 936GB/s of peak bandwidth, versus 672GB/s on the Titan RTX. It’s also much more bandwidth than the GeForce RTX 3080’s 760GB/s.The GA102 GPU has a newer 3rd Gen NVLink interface, which includes four x4 links, each providing up to 14.0625GB/sec of bi-directional bandwidth, for a total of 56.25GB/sec of bi-directional bandwidth or 112.5 GB/sec total aggregate bandwidth between two GPUs. However, the GeForce RTX 3090 is currently the only RTX 30 series card with those links. Two GeForce RTX 3090s can be linked for operation in traditional SLI modes, but 3-Way and 4-Way SLI configurations are not supported. Further, NVIDIA has disclosed that future SLI development will shift to game developers, in lieu of driver-based profiles.
The enhancements introduced with Ampere aren’t all about performance, though. NVIDIA also tweaked a few things to improve overall efficiency too. For example, with previous-gen architectures, NVIDIA had one power rail for both the GPU cores and memory controller. A single-rail design meant that if one resource wanted to operate at high voltage, the other had to as well. With Ampere, however, NVIDIA bifurcated the core and memory power rails into separate feeds, so they can operate independently. Dual power rails should allow for finer-grained control and energy savings, which ultimately means improved power and thermal characteristics.
Speaking of thermals, we have to talk about the GeForce RTX 3090’s cooling solution. The cooler on the GeForce RTX 3090 look like, and has a similar configuration to the 3080’s, but it is bigger and more capable. In fact, the RTX 3090 is a triple-slot card, which is a first for an NVIDIA-built design.
The GeForce RTX 3090’s cooler is outfitted with dual axial fans, and a split heatsink design that is quieter that previous-gen solutions, while capable of dissipating up to 90 more watts of power. One end of the heatsink is attached to a vapor chamber, that’s mounted directly to the GPU and memory. The fan above that section directs air through the heatsink and immediately funnels it out of the chassis through large vents in the case bracket. The heatsink on the back half of the card, which is linked to the front vapor chamber via multiple heat-pipes, allows air from the second fan to pass all the way through, where it is rises to the top of the chassis and is eventually exhausted from a system, assuming it’s got decent ventilation.
The passthrough cooler design on the GeForce RTX 3090 works in conjunction with a denser, shorter PCB that sports a miniaturized 12-pin power connector, like the GeForce RTX 3080 Founder’s Edition. NVIDIA includes an adapter with the cards that converts a pair of traditional 8-pin PCIe connectors to the new mini-12-pin design should you need one, and we're told PSU manufacturers will be offering modular cables with the new connector as well. However, not all of NVIDIA’s board partners have adopted the mini-12-pin connector and will stick with full-sized 8-pin connectors on their cards.
Like the RTX 3080, the GeForce RTX 3090 has triple full-sized DisplayPorts (1.4a) and a single HDMI output. The USB-C connector on high-end Turing cards, which was meant to be used with VR headsets, wasn't being used often so NVIDIA nixed it with the RTX 30 series. We should point out that the HDMI port conforms to the 2.1 standard, which enables 4K120P with G-Sync on some of the latest OLED TVs and displays, as well as 8K resolution with a single cable.
The GA102 GPU is equipped with the same 7th Gen NVENC encoding engine as Turing, but has a newer 5th Gen NVDEC engine. The new 5th Gen decoder supports hardware-accelerated decoding of the MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and brand-new AV1 codecs. And there’s a lot more to mention like, RTX IO, NVIDIA Reflex latency reduction technology, the Omniverse Machinima AI-assisted mixer app, and the NVIDIA Broadcast audio and video enhancement plug-in for streamers and creators. We covered all of that stuff in our RTX 3080 reviews and Ampere-architecture piece. Pop over to this URL if you want those deets.
Now let’s get to some numbers...