A Close Look At AMD's New And Improved RDNA 3 Architecture

RDNA 3 Architecture: No Transistor Left Behind





In RDNA 3, one WGP still comprises two compute units, but one of the major changes is that each compute unit now holds a pair each of ALUs and vector units. Still one WGP, still two CUs, but each CU now has twice the resources. That's how we see a move from 40 WGPs on Navi 21 to 48 WGPs in Navi 31, yet AMD claims that FP32 compute performance has increased by 2.7x. The actual shader count hasn't increased (beyond the 20% bump in WGPs), but each shader has twice as many functional units. There is twice as much L0 cache, too, now up to 32KB.





Each of these vector units now has additional parts as well. For starters, the doubled SIMD32s can handle 64-bit multi-precision operations, including single-clock Wave64 FMAs. They've also been upgraded with support for the popular bfloat16 data type, which improves compute performance in AI workloads. Speaking of AI workloads, there's a brand-new AI matrix accelerator that AMD says offers a 2.7x boost in matrix math operations. Curiously, AMD says that this matrix accelerator isn't intended for use in general compute workloads, but strictly in games.





That likely refers to AMD's upcoming FSR 3 technology. Revealed at the same time as the Radeon RX 7900 series cards, the third whole-number iteration of FidelityFX Super Resolution is a frame-generation technology similar (at least in concept) to NVIDIA's DLSS 3 . AMD has shared scant few details about FSR 3, but we can reasonably assume that it will take advantage of RDNA 3's new matrix math unit in some way.





So what didn't get duplicated in RDNA 3? Well, there's still only one ray accelerator per compute unit. Don't fret, though; ray-tracing performance has been drastically improved, in part by the aforementioned upgrades to the compute units, particularly including the lower-level cache bumps. AMD emphasized something that NVIDIA has also said, and that's that ray-tracing is fundamentally a compute workload like any other. You can increase performance there by piling on compute, but the much faster way to improve RT performance is through efficiency optimizations.





The World's First Chiplet GPUs





But how do you do a chiplet GPU? If you split the processor core, you end up with the same sort of situation we saw with multi-GPU technologies like Crossfire and competitor NVIDIA's SLI—technologies that are all-but abandoned these days due to their difficulties. AMD took a smarter tack: just break off the components that don't benefit from scaling down to denser process nodes.





The Navi 31 GPU comprises a single large processor, known as the GCD, and then six smaller chips known as MCDs. The GCD is the GPU as we conventionally think of it, simply without memory interfaces or last-level cache. Those parts reside on the MCDs, and the GCD is connected to the MCDs using ultra-high-bandwidth links known as Infinity Fanouts.





AMD talked at length about the difficulties it faced when creating the first chiplet GPUs. Doing chiplets on a CPU was relatively easy; even many-core CPUs are highly serial in comparison to GPUs, so you don't need to manage dozens of links running at extremely high bandwidth. GPU shader engines require absolutely massive amounts of connectivity, so the company had to create a custom high-throughput link enabling 5.3 TB/sec communication between the GCD and its MCDs. It accomplishes this while drawing less than 5% of the total board power, too.





If it's so difficult, why bother? The company points out that etching silicon on newer nodes is rapidly getting more expensive, while some parts of chips (like memory interfaces and caches) don't really gain anything from being on the latest process. By separating out the parts of the GPU that don't need to be on the latest process, AMD can optimize its cost-vs-benefit analysis.





AMD's RDNA 3: The First Step For The Future Of Radeon





This article has focused on Navi 31, the GPU behind the Radeon RX 7900 series, but there's probably a whole family of Radeon GPUs on the way based on this architecture. AMD hasn't commented on them yet, though, so we'll just talk about what the company has announced: the Radeon RX 7900 XT and Radeon RX 7900 XTX.





AMD Radeon RX 7900 XTX







AMD Radeon RX 7900 XT





