NVIDIA RTX Blackwell In-Depth: Exploring The Heart Of GeForce RTX 50

by Marco Chiappetta — Wednesday, January 15, 2025, 11:30 AM EDT

Page 1:
The NVIDIA RTX Blackwell Graphics Arcitecture With DLSS 4 And Neural Rendering
- Page 2: RTX Blackwell Creator Features, Expected Performance And Key Takeaways

After months of rumors, speculation and leaks, NVIDIA CEO Jensen Huang officially unveiled the GeForce RTX 50 series based on the RTX Blackwell graphics architecture, during a mega-keynote address at the Michelob Ultra Arena in Las Vegas during CES 2025. For regular readers, the initial announcements weren’t a surprise, but some of the inner-workings and new features in RTX Blackwell that Jensen revealed left many in attendance (and viewing on-line) slack-jawed, for multiple reasons.

The RTX Blackwell graphics architecture at the foundation of the GeForce RTX 50 series takes everything from the company's previous gen Ada architecture – which powers the RTX 40 series – and cranks it up to 11. RTX Blackwell also introduces the concept of neural rendering, which will likely mark a paradigm shift in the way PC games are rendered in the future.

Needless to say, there’s a lot to get to. But before we dive in, we need to point you to some previous coverage. You can watch NVIDIA's CES Keynote here for details straight from the horse's mouth, and check out the cards and the GeForce RTX 5090's tiny PCB too. We're going to save the deep dive on the actual hardware for our upcoming reviews, and focus on the actual RTX Blackwell architecture and what it enables here...

We’re going to dive deeper on the pages ahead, but this quick summary comparing the last few GeForce generations lays much of the groundwork. The GeForce RTX 50 series features updated shader cores with support for neural shaders, in addition to 4th gen RT (ray tracing) cores and 5th gen Tensor cores, which add support for FP4. DLSS 4 debuts with the RTX 50 series, a new AI Management – or AMP – processor is integrated into RTX Blackwell GPUs, and the media engine has been beefed-up with additional, more capable encoders and decoders. The RTX 50 series also features a native PCIe gen 5 interface, support for DisplayPort 2.1b (up to UHBR20), and GPUs are fed by the latest high speed GDDR7 memory.

Just about every aspect of the RTX 50 series is upgraded over previous generations, which results in significant performance uplifts in virtually every type of workload, from generative AI, to media transcoding, rasterization, and everything in between.

NVIDIA RTX Blackwell Architecture Overview

NVIDIA made changes and additions to all of the various cores and IP employed in the GeForce RTX 50 series. The shader cores, RT cores, and Tensor cores are all gain new features and capabilities.

Starting with the shader cores, NVIDIA enabled full FP32 and INT32 support in all of the shader cores in RTX Blackwell. In the previous generation Ada graphics architecture, half of the shaders in each SM supported FP32 / INT32, while the other half only supported FP32. This effectively doubles the INT32 bandwidth. Shader Execution Reordering, or SER, throughput in the SM has also doubled in Blackwell.

Neural shaders also debut in the RTX 50 series. Previously, it wasn’t possible to access the tensor cores through a compute shader in a graphics API. For the RTX 50 series, however, NVIDIA has worked with Microsoft to create something called Cooperative Vectors within DirectX, which unlocks the tensor cores and gives developers the ability to run AI models on the tensor cores, in game. More on what that enables shortly.

RTX Blackwell’s RT cores have also been significantly redesigned. The feature similar Box Intersection and Opacity Micromap engines to the previous-gen Ada architecture, but the Triangle Intersection engine present in Blackwell has been upgraded to what NVIDIA calls a Triangle Cluster Intersection engine, and Triangle Cluster Compression and support for Linear Swept Spheres have been added. The new capabilities in the RT cores enabled what NVIDIA calls Mega Geometry.

The Tensor cores in RTX Blackwell gain support for FP4, over and above the FP8 and FP16 support in Ada. Support for FP4 effectively doubles the throughput over FP8, while simultaneously reducing memory requirements for a particular model. FP4 also technically results in reduced precision versus FP8 or FP16, but by optimally quantizing models for the data type and architecture, the impact in many consumer use cases should be negligible.

To help the GeForce RTX 50 series better manage the varied AI workloads that will be run on the GPUs alongside traditional game engine code, NVIDIA has also incorporated an AI Management Processor, or AMP, into the design. AMP is a programmable processor that sits at the front of the GPU that can interact closely with all of the different cores. AMP evaluates rendering criteria to optimally dispatch and schedule AI and graphics workloads across the various cores.

To keep the GPUs fed with data, NVIDIA is using the latest GDDR7 memory on the GeForce RTX 50 series. GDD7 offers double the data rate of GDDR6 (not 6X), with significantly better efficiency. That translates to higher memory bandwidth, and reduced energy consumption. The top end GeForce RTX 5090 will offer up to 1.8TB/s of peak memory bandwidth, which is about 80% higher than the RTX 4090.

The display and media engines in the RTX 50 series have also been upgraded versus Ada. The RTX 50 series support DisplayPort 2.1 with a UHBR20 data rate of up to 20Gbps, to enable high refresh rates at ultra-high resolutions, with HDR visuals. The RTX 50 series also features high-performance, hardware based Flip Metering, which shifts the frame pacing logic to the display engine, to allow the GPU to precisely manage display timing and accurately pace frames at high framerates and when multi frame generation is used.

RTX 50 Series Max-Q Power Management

NVIDIA’s goal with the RTX 50 series was to extract the maximum amount of performance possible within a given platform’s power budget. Like previous gen chips, when parts or all of the GPU are idle, they quickly enter into deeper power states, or shut off altogether. But NVIDIA enhanced Blackwell’s capabilities in this regard too.

RTX 50 series GPUs now support clock, power and rail gating, and NVIDIA also improved the ability to dynamically adjust frequencies and voltages.

The efficiency optimization starts with clock gating, but if entire engines go idle, the logic and SRAMS can take advantage of progressively deeper power states, until entering the deepest sleep states, or be shut down altogether. A second power rail for the GPUs has also been added and each can be gated as necessary. NVIDIA claims that rail gating specifically helps battery life on the mobile variants.

None of that is really new, but the RTX 50 series can enter deeper power states and exit them more quickly than Ada. In fact, NVIDIA claims Blackwell reduces the time to enter a deep sleep by a factor of 10. The new GPUs also offer accelerated frequency switching, and responsiveness has reportedly been improved by a factor of 1000. With The RTX 50 series, frequencies can be adjusted with a single frame based on the workload, which helps extract maximum efficiency within the SMs.

Introducing RTX Neural Rendering

As we mentioned earlier, RTX Blackwell also introduces the concept of neural rendering and what the company calls the “NVIDIA RTX Kit”. The NVIDIA RTX Kit is basically an umbrella term for RTX Neural Shaders, Hair and Skin, RTX Mega Geometry, DLSS 4, Reflex 2, and RTX Remix – which is just about to hit its 1 year anniversary.

Note as you’re digesting many of these new technologies, that they are being enabled for developers now, but will arrive in actual games at various points in the future, except for DLSS 4, which will be enabled on day 0 in roughly 75 titles.

RTX Neural Materials allows game developers and artists to create higher quality, more lifelike materials that require fewer memory resources. RTX Neural Materials takes the shader code and the collective of layers (textures, etc.) for a model, builds them out, and then uses AI to compresses them at up to a 7:1 compression ratio, which is significantly better than traditional block compression methods. Material processing is also up to 5x faster than previous generations. In one example shown by NVIDIA, standard materials required about 47MB of memory, but with RTX Neural Material the memory requirement was brought down to 16MB, with better visual fidelity.

Next up is RTX Neural Radiance Cache. RTX Neural Radiance Cache is essentially an AI-based approach to accurately rending indirect light in a scene. It works by doing training in run time using the GPU to create a model in real time, and then caching the lighting in the scene geospatially. While you play, the small neural networks are training on the game data. What this does is allow just one lookup into the cache to handle many light bounces. It effectively traces the ray path at a shorter distance and lets the AI infer the rest. RTX Neural Radiance Cache is currently available in the RTX Global Illumination SDK, and will be available in Portal with RTX, and is coming to RTX Remix in a few months.

NVIDIA also introduced RTX Neural Faces, which are high-quality, generative AI faces. RTX Neural Faces takes a standard 3D face and replaces it in real time with a more photo-real AI generated face. The demo shown was of significantly higher quality than the traditionally rendered face.

Also coming with RTX Blackwell is accelerated Ray Traced Strand Based Hair. Ray tracing hair is computationally heavy, due to all of the geometry / triangles required for each strand. Blackwell, however, can ray trace hair using linear swept spheres. Only two spheres are required per line segment of hair, which is more concise (up to 3x less data) in how it is stored in the BVH (bounding volume hierarchy).

Which brings us to RTX Mega Geometry. RTX Mega Geometry accelerates BVH updates for cluster-based systems like Unreal Engine 5’s Nanite. This gives developers the ability to use much higher resolution meshes within ray traced scenes and eliminates the need for proxy meshes, which don’t capture nearly the amount of geometry.

RTX Blackwell Introduces DLSS 4 And Multi Frame Generation

DLSS is also getting a complete overhaul with the introduction of the RTX 50 series. DLSS 4 introduces multi frame generation to further boost framerates and moves to a transformer AI model to improve image quality and visual fidelity.

Transformer models can be trained on much larger data sets and infer much more data from the training. Transformers require approximately 4x as much compute as previous DLSS models, but ultimately allow NVIDIA (and end users) to better tweak and modify the tradeoffs between smoothness, framerate and image quality that come with using a technology like DLSS. Previous generations of DLSS used convolutional neural networks (CNN), which produced good results, but could sometimes result in flickering, shimmering and other artifacts, like ghosting. Many artifacts visible with previous versions of DLSS are fixed with this new transformer model approach.

The “smarter” transformer model also enhances super resolution, by being able to recover and deduce additional details, and ultimately producing a better upscaled image.

DLSS 4, when using multi frame generation, will be running five AI models per frame – the SR and Ray Reconstruction model, the Frame-Gen model, and smaller models comparing pairs of rendered frames. The end result is that 15 out of every 16 pixels you see when multi frame gen is enabled are generated by AI.

Note that only multi frame generation is exclusive to the RTX 50 series. All other DLSS features will be enhanced by the use of the transformer model, so previous-gen RTX cards will reap the benefits as well, with the features they support.

We should also mention that NVIDIA is introducing new DLSS override functionality within the NVIDIA app ass well. DLSS overrides will give users the ability to enabled DLSS multi frame generation in 75 DLSS FG titles, try out the latest transformer models in DLSS SR titles, and override DLAA and Ultra Performance modes for DLSS SR titles (even in games that don’t provide a UI toggle for manipulating these features).

NVIDIA Reflex 2 To Further Reduce Latency With Frame Warping

One of the issues with frame generation is the potential impact on game responsiveness, because user input didn't effect the generated frame. To further enhance responsiveness in games, NVIDIA is also introducing Reflex 2. Reflex takes all of the goodness of Reflex and incorporates support for frame warp. Frame warping, however, may reveal holes in the data – for example, what’s behind a pillar or wall. To address that issue, NVIDIA is also introducing inpainting predictive rendering to fill in those parts of the scene. This won't be a win for every game, but it is a good option for games where someone wants optimal latency. With Reflex 2, NVIDIA is claiming up to 75% faster responsiveness. Reflex 2 is coming to The Finals and Valorant soon, and will come to additional titles at later dates.

So, what does all of this AI and technology do for framerates and latency? Well, it results in significantly reduced latency and up to 8x higher framerates versus native rendering.

Let's look at additional features and projected performance of NVIDIA's GeForce RTX 50 Series, next...

Tags: Nvidia, GeForce, GPU, AI, (nasdaq:nvda), blackwell, geforce-rtx-50-series

NVIDIA RTX Blackwell In-Depth: Exploring The Heart Of GeForce RTX 50

The NVIDIA RTX Blackwell Graphics Arcitecture With DLSS 4 And Neural Rendering

NVIDIA RTX Blackwell Architecture Overview

RTX 50 Series Max-Q Power Management

Introducing RTX Neural Rendering

RTX Blackwell Introduces DLSS 4 And Multi Frame Generation

NVIDIA Reflex 2 To Further Reduce Latency With Frame Warping

Related content

Login with Social Media or Manually