NVIDIA GeForce RTX 3090 Review: BFGPU Benchmarks Unleashed
GeForce RTX 3090: NVIDIA's BFGPU Has Arrived And It Slays
What happens when you take the burly GPU powering the GeForce RTX 3080, enable a couple thousand more cores, widen the memory interface, more than double the memory capacity, boost the texturing capabilities, and outfit the beast with a quiet, oversized, high-performance cooler? Well, we’ll tell you. You end up with the GeForce RTX 3090, or the “BFGPU” as NVIDIA’s CEO Jensen Huang called it during its official unveiling. If you’re unfamiliar with the Doom BFG reference there, we’ll let you look it up. Rest assured, the BF doesn’t stand for Best Friend.
Whatever you want to call the GeForce RTX 3090, one thing is for certain. As of this moment, the GeForce RTX 3090 is the single most powerful graphics card money can (almost) buy. It sits at the pinnacle of NVIDIA’s product stack currently, and according to the company, it enables things like smooth 8K gaming and seamless processing of massive content creation workloads, thanks in part to its 24GB of on-board GDDR6X memory.
A graphics card like the GeForce RTX 3090 isn’t for everyone, however. Though its asking price is about a $1,000 lower than its previous-gen, Turing-based Titan RTX counterpart, it is still out of reach for most users. And the GeForce RTX 3090’s performance characteristics will likely make its value proposition interesting to only a select group of enthusiasts and creators. We’ll do out best to better explain all of that on the pages ahead. For now, let’s take a look at the specs and inspect this big, beautiful beast...
GeForce RTX 3090, it is essentially the replacement for the previous-gen Titan RTX. As such, it isn’t purely a gaming-focused GPU. According to NVIDIA, demand for the various Titans was higher than anticipated, so with this generation, in addition to selling them directly, NVIDIA is working with board partners to go wider with availability, so they will be offering GeForce RTX 3090 series cards as well.
Before we dive any deeper into the speeds and feeds though, we need to direct your attention to a few previous articles. We have already covered much of the underlying technology at the heart of the GeForce RTX 3090, so we won’t be doing so again here. If you want some of the backstory, however, we recommend checking out our coverage of NVIDIA’s initial GeForce RTX 30 series announcement, the deeper dive on its new features and the Ampere architecture, and last week’s GeForce RTX 3080 reviews. Once you’ve got that all digested, you’ll understand much of what the GeForce RTX 3090 is all about.
NVIDIA GeForce RTX 3090 Speed And FeedsAs you can see in the detailed spec breakdown and comparison above, the new GA102-powered GeForce RTX 3090 is amped-up and more capable than the previous-gen Titan RTX in almost every way, except for two. The GeForce RTX 3090 has a lower default boost clock and fewer Tensor cores. The GA102’s newer architecture and additional resources more than compensate for the lower default boost frequency though, and Ampere’s 3rd-generation Tensor cores more than double the throughput of the previous-generation, in addition to supporting additional types of math, like BLfoat16 (BF16) and TensorFloat-32 (TF32). In regards to pixel and texture fillrate, memory bandwidth, and compute performance, the GeForce RTX 3090 is significantly more powerful than the Turing-based Titan RTX, or anything else for that matter.
The GA102 GPU has a die size of 628.4mm2 and is comprised of roughly 28 billion transistors. The chips are also manufactured on a newer, custom, Samsung 8nm process (8N) than their previous-gen, Turing-based counterparts, which used a 12nm FinFET process a TSMC. Notice, that despite packing nearly 10 billion more transistors, the GA102’s die size is about 126mm2 smaller, thanks to that more advanced process.
As we’ve mentioned in our previous GeForce RTX 30 series and Ampere coverage, all of those additional transistors were used to enable new features, like PCIe Gen 4 support, and enhance Ampere’s performance for virtually all GPU-bound workloads. Pre-Turing, NVIDIA’s GPU architectures had only one data path, for example. A second one was added with Turing, though -- one for floating point and, one for integer. And with Ampere that second Integer path has been beefed up with an additional FP32 unit, so floating point heavy workloads have much more horsepower at their disposal.
The NVIDIA GA102’s SM (Streaming Multiprocessor) configuration has also been completely revamped. Ampere’s new SMs double the L1 bandwidth and cache partition size and add 33% more L1 capacity, for up to 10,496KB on the GeForce RTX 3090.
NVIDIA found that Turing often had good Bounding Box intersection rates, but Triangle Intersection rates were a limiting factor with some workloads, so Ampere got some attention in that regard as well. Ampere can now process Bounding Box and Triangle intersection rates in parallel to improve efficiency and performance, and thanks to the additional GPU resources available, Triangle Intersection rates are approximately twice as fast now too. A new Triangle Position Interpolation unit has also been added, which will enable more accurate motion blur effects in future RTX-enabled applications.
Bleeding-Edge Memory And Cooling TechLike the GeForce RTX 3080, the GeForce RTX 3090 is outfitted with Micron’s latest GDDR6X memory technology (the upcoming GeForce RTX 3070 will use standard GDDR6), which offers much higher bandwidth. GDDR6X leverages 4-level PAM4 signaling that can transmit twice as much data per clock, effectively doubling bandwidth per tick. The first wave of flagship Ampere-based GeForces will employ GDDR6X memory with data rates up to 19.5Gbps. On the GeForce RTX 3090 specifically, which features 24GB of on-board memory, linked to the GPU via a 384-bit memory interface, that equates to 936GB/s of peak bandwidth, versus 672GB/s on the Titan RTX. It’s also much more bandwidth than the GeForce RTX 3080’s 760GB/s.
The enhancements introduced with Ampere aren’t all about performance, though. NVIDIA also tweaked a few things to improve overall efficiency too. For example, with previous-gen architectures, NVIDIA had one power rail for both the GPU cores and memory controller. A single-rail design meant that if one resource wanted to operate at high voltage, the other had to as well. With Ampere, however, NVIDIA bifurcated the core and memory power rails into separate feeds, so they can operate independently. Dual power rails should allow for finer-grained control and energy savings, which ultimately means improved power and thermal characteristics.
The GeForce RTX 3090’s cooler is outfitted with dual axial fans, and a split heatsink design that is quieter that previous-gen solutions, while capable of dissipating up to 90 more watts of power. One end of the heatsink is attached to a vapor chamber, that’s mounted directly to the GPU and memory. The fan above that section directs air through the heatsink and immediately funnels it out of the chassis through large vents in the case bracket. The heatsink on the back half of the card, which is linked to the front vapor chamber via multiple heat-pipes, allows air from the second fan to pass all the way through, where it is rises to the top of the chassis and is eventually exhausted from a system, assuming it’s got decent ventilation.
on their cards.
Pop over to this URL if you want those deets.
Now let’s get to some numbers...