NVIDIA GeForce RTX 3090 Review: BFGPU Benchmarks Unleashed

Name: GeForce RTX 3090
Brand: GeForce

by Marco Chiappetta — Thursday, September 24, 2020, 09:01 AM EDT

What happens when you take the burly GPU powering the GeForce RTX 3080, enable a couple thousand more cores, widen the memory interface, more than double the memory capacity, boost the texturing capabilities, and outfit the beast with a quiet, oversized, high-performance cooler? Well, we’ll tell you. You end up with the GeForce RTX 3090, or the “BFGPU” as NVIDIA’s CEO Jensen Huang called it during its official unveiling. If you’re unfamiliar with the Doom BFG reference there, we’ll let you look it up. Rest assured, the BF doesn’t stand for Best Friend.

Whatever you want to call the GeForce RTX 3090, one thing is for certain. As of this moment, the GeForce RTX 3090 is the single most powerful graphics card money can (almost) buy. It sits at the pinnacle of NVIDIA’s product stack currently, and according to the company, it enables things like smooth 8K gaming and seamless processing of massive content creation workloads, thanks in part to its 24GB of on-board GDDR6X memory.

A graphics card like the GeForce RTX 3090 isn’t for everyone, however. Though its asking price is about a $1,000 lower than its previous-gen, Turing-based Titan RTX counterpart, it is still out of reach for most users. And the GeForce RTX 3090’s performance characteristics will likely make its value proposition interesting to only a select group of enthusiasts and creators. We’ll do out best to better explain all of that on the pages ahead. For now, let’s take a look at the specs and inspect this big, beautiful beast...

NVIDIA GeForce RTX 3090

Specifications & Features

Find GeForce RTX 3090 Card @ Amazon.Com

Though NVIDIA has branded this card the GeForce RTX 3090, it is essentially the replacement for the previous-gen Titan RTX. As such, it isn’t purely a gaming-focused GPU. According to NVIDIA, demand for the various Titans was higher than anticipated, so with this generation, in addition to selling them directly, NVIDIA is working with board partners to go wider with availability, so they will be offering GeForce RTX 3090 series cards as well.

Before we dive any deeper into the speeds and feeds though, we need to direct your attention to a few previous articles. We have already covered much of the underlying technology at the heart of the GeForce RTX 3090, so we won’t be doing so again here. If you want some of the backstory, however, we recommend checking out our coverage of NVIDIA’s initial GeForce RTX 30 series announcement, the deeper dive on its new features and the Ampere architecture, and last week’s GeForce RTX 3080 reviews. Once you’ve got that all digested, you’ll understand much of what the GeForce RTX 3090 is all about.

NVIDIA GeForce RTX 3090 Speed And Feeds

As you can see in the detailed spec breakdown and comparison above, the new GA102-powered GeForce RTX 3090 is amped-up and more capable than the previous-gen Titan RTX in almost every way, except for two. The GeForce RTX 3090 has a lower default boost clock and fewer Tensor cores. The GA102’s newer architecture and additional resources more than compensate for the lower default boost frequency though, and Ampere’s 3rd-generation Tensor cores more than double the throughput of the previous-generation, in addition to supporting additional types of math, like BLfoat16 (BF16) and TensorFloat-32 (TF32). In regards to pixel and texture fillrate, memory bandwidth, and compute performance, the GeForce RTX 3090 is significantly more powerful than the Turing-based Titan RTX, or anything else for that matter.

The GA102 GPU has a die size of 628.4mm² and is comprised of roughly 28 billion transistors. The chips are also manufactured on a newer, custom, Samsung 8nm process (8N) than their previous-gen, Turing-based counterparts, which used a 12nm FinFET process a TSMC. Notice, that despite packing nearly 10 billion more transistors, the GA102’s die size is about 126mm² smaller, thanks to that more advanced process.

As we’ve mentioned in our previous GeForce RTX 30 series and Ampere coverage, all of those additional transistors were used to enable new features, like PCIe Gen 4 support, and enhance Ampere’s performance for virtually all GPU-bound workloads. Pre-Turing, NVIDIA’s GPU architectures had only one data path, for example. A second one was added with Turing, though -- one for floating point and, one for integer. And with Ampere that second Integer path has been beefed up with an additional FP32 unit, so floating point heavy workloads have much more horsepower at their disposal.

Ampere’s 2nd-generation RT (ray tracing) cores have also been optimized for better performance. The 82 RT cores in the GeForce RTX 3090 (up from 72 in the Titan RTX) offer up to 35.6 TFLOPS of compute performance across multiple precision levels (vs. 16.3 – 32.6 TFLOPS on Turing) and the 3rd-gen Tensor cores offer up to 284 TFLOPS of Int8 performance, versus 261 on the Titan RTX (double those numbers for Int4). We should also point out that the 2nd-gen RT cores offer 2x the triangle intersection rate of Turing and those 3rd-gen Tensor cores double up math performance for sparse matrices, e.g. matrices in which most of the elements are zero.

The NVIDIA GA102’s SM (Streaming Multiprocessor) configuration has also been completely revamped. Ampere’s new SMs double the L1 bandwidth and cache partition size and add 33% more L1 capacity, for up to 10,496KB on the GeForce RTX 3090.

NVIDIA found that Turing often had good Bounding Box intersection rates, but Triangle Intersection rates were a limiting factor with some workloads, so Ampere got some attention in that regard as well. Ampere can now process Bounding Box and Triangle intersection rates in parallel to improve efficiency and performance, and thanks to the additional GPU resources available, Triangle Intersection rates are approximately twice as fast now too. A new Triangle Position Interpolation unit has also been added, which will enable more accurate motion blur effects in future RTX-enabled applications.

Bleeding-Edge Memory And Cooling Tech

Like the GeForce RTX 3080, the GeForce RTX 3090 is outfitted with Micron’s latest GDDR6X memory technology (the upcoming GeForce RTX 3070 will use standard GDDR6), which offers much higher bandwidth. GDDR6X leverages 4-level PAM4 signaling that can transmit twice as much data per clock, effectively doubling bandwidth per tick. The first wave of flagship Ampere-based GeForces will employ GDDR6X memory with data rates up to 19.5Gbps. On the GeForce RTX 3090 specifically, which features 24GB of on-board memory, linked to the GPU via a 384-bit memory interface, that equates to 936GB/s of peak bandwidth, versus 672GB/s on the Titan RTX. It’s also much more bandwidth than the GeForce RTX 3080’s 760GB/s.

The GA102 GPU has a newer 3rd Gen NVLink interface, which includes four x4 links, each providing up to 14.0625GB/sec of bi-directional bandwidth, for a total of 56.25GB/sec of bi-directional bandwidth or 112.5 GB/sec total aggregate bandwidth between two GPUs. However, the GeForce RTX 3090 is currently the only RTX 30 series card with those links. Two GeForce RTX 3090s can be linked for operation in traditional SLI modes, but 3-Way and 4-Way SLI configurations are not supported. Further, NVIDIA has disclosed that future SLI development will shift to game developers, in lieu of driver-based profiles.

The enhancements introduced with Ampere aren’t all about performance, though. NVIDIA also tweaked a few things to improve overall efficiency too. For example, with previous-gen architectures, NVIDIA had one power rail for both the GPU cores and memory controller. A single-rail design meant that if one resource wanted to operate at high voltage, the other had to as well. With Ampere, however, NVIDIA bifurcated the core and memory power rails into separate feeds, so they can operate independently. Dual power rails should allow for finer-grained control and energy savings, which ultimately means improved power and thermal characteristics.

Speaking of thermals, we have to talk about the GeForce RTX 3090’s cooling solution. The cooler on the GeForce RTX 3090 look like, and has a similar configuration to the 3080’s, but it is bigger and more capable. In fact, the RTX 3090 is a triple-slot card, which is a first for an NVIDIA-built design.

The GeForce RTX 3090’s cooler is outfitted with dual axial fans, and a split heatsink design that is quieter that previous-gen solutions, while capable of dissipating up to 90 more watts of power. One end of the heatsink is attached to a vapor chamber, that’s mounted directly to the GPU and memory. The fan above that section directs air through the heatsink and immediately funnels it out of the chassis through large vents in the case bracket. The heatsink on the back half of the card, which is linked to the front vapor chamber via multiple heat-pipes, allows air from the second fan to pass all the way through, where it is rises to the top of the chassis and is eventually exhausted from a system, assuming it’s got decent ventilation.

Who's The Titan Now?

The passthrough cooler design on the GeForce RTX 3090 works in conjunction with a denser, shorter PCB that sports a miniaturized 12-pin power connector, like the GeForce RTX 3080 Founder’s Edition. NVIDIA includes an adapter with the cards that converts a pair of traditional 8-pin PCIe connectors to the new mini-12-pin design should you need one, and we're told PSU manufacturers will be offering modular cables with the new connector as well. However, not all of NVIDIA’s board partners have adopted the mini-12-pin connector and will stick with full-sized 8-pin connectors on their cards.

Like the RTX 3080, the GeForce RTX 3090 has triple full-sized DisplayPorts (1.4a) and a single HDMI output. The USB-C connector on high-end Turing cards, which was meant to be used with VR headsets, wasn't being used often so NVIDIA nixed it with the RTX 30 series. We should point out that the HDMI port conforms to the 2.1 standard, which enables 4K120P with G-Sync on some of the latest OLED TVs and displays, as well as 8K resolution with a single cable.

The GA102 GPU is equipped with the same 7th Gen NVENC encoding engine as Turing, but has a newer 5th Gen NVDEC engine. The new 5th Gen decoder supports hardware-accelerated decoding of the MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and brand-new AV1 codecs. And there’s a lot more to mention like, RTX IO, NVIDIA Reflex latency reduction technology, the Omniverse Machinima AI-assisted mixer app, and the NVIDIA Broadcast audio and video enhancement plug-in for streamers and creators. We covered all of that stuff in our RTX 3080 reviews and Ampere-architecture piece. Pop over to this URL if you want those deets.

Now let’s get to some numbers...

Tags: Nvidia, GPU, (nasdaq:nvda), ampere, geforce rtx 3090

Marco Chiappetta

Marco's interest in computing and technology dates all the way back to his early childhood. Even before being exposed to the Commodore P.E.T. and later the Commodore 64 in the early ‘80s, he was interested in electricity and electronics, and he still has the modded AFX cars and shop-worn soldering irons to prove it. Once he got his hands on his own Commodore 64, however, computing became Marco's passion. Throughout his academic and professional lives, Marco has worked with virtually every major platform from the TRS-80 and Amiga, to today's high end, multi-core servers. Over the years, he has worked in many fields related to technology and computing, including system design, assembly and sales, professional quality assurance testing, and technical writing. In addition to being the Managing Editor here at HotHardware for close to 15 years, Marco is also a freelance writer whose work has been published in a number of PC and technology related print publications and he is a regular fixture on HotHardware’s own Two and a Half Geeks webcast. - Contact: marco(at)hothardware(dot)com