AMD Unveils Navi RDNA Architecture: Under The Hood Of Radeon RX 5700
With RDNA, AMD wanted to create the fundamental building block of a CU that could scale from a few watts, all the way on up to a few hundred watts, to address the needs of everything from low-power mobile devices, to powerful desktops, workstations and cloud gaming.
The Navi-based GPU at the heart of the upcoming Radeon RX 5700 series is manufactured on TSMC’s 7nm process node, and features GDDR6 memory, along PCI Express 4.0 interface support. The GPU also features new Radeon Media and Radeon Display engines, to better address the needs of streamers and content creators and usher in an array of new display technologies. Asus has the world’s first display stream compression monitor coming down the pipe that leverages the technology as well.
The new CU design in Navi features a multi-level cache hierarchy and a streamlined graphic pipeline designed to not only improve performance per clock, but also push clock frequencies higher. The new CU offers double the instruction rate of GCN, and features twice the number of scalar units and double the number of schedulers. It also introduces single cycle issue Wave32 execution on SIMD32 and the concept of "resource pooling." With resource pooling, two CUs can coordinate and function as a Work Group processor. For example, a 64 threads can be grouped as two Wave32 instructions, and executed in a single clock. The flexibility of the CUs and ability to pool resources improves single-thread performance, GPU utilization, and efficiency
Navi features a new cache hierarchy as well. AMD added a new L1 cache and doubled the load bandwidth for the L0 cache to ALU. The new cache hierarchy reduces cache latency at each level and improved effective bandwidth as well.
The DCC (Delta Color Compression) algorithm in Navi has been improved and has also been made available to the broader part of cache subsystem. With Navi, shaders can now read and write compressed color data. The new display unit can also read compressed data in the frame buffer, without decompressing it first. The end result is higher effective bandwidth throughout the GPU.
The entire graphics pipeline in Navi has been improved to enhance efficiency for better performance-per-clock, but the GPU also feature more effective clock gating for better overall power efficiency. The levels of logic around the GPU have been reduced to achieve higher frequencies.
Normalized versus GCN, RDNA delivers more than 50% better performance pre watt and 25% better overall performance. Greater than 50% of that improvement comes from architecture optimizations according to AMD; the GPU also gets a boost from its 7nm process and frequencey improvements.
The Navi 10 GPU at the heart of the Radeon RX 5700 series features 40 RDNA Compute Units, comprised of 80 Scalar Processors, 2560 Stream Processors, and 160 64-bit Bilinear Filter Units. The GPU features 4MB of L2 cache, 512K of L1, and double the V$L0 load bandwidth, with support for DCC (Delta Color Compression) throughout the chip. The streamlined graphics engine has a new Geometry Engine, 64 Pixel Units, and 4 Async Compute Engines.
According to AMD, the 40 CU design of Navi offers about 14% better performance than Vega64, at 23% lower power. And with a much smaller die size to boot; the Navi 10 GPU is comrised of about 10.3B transistors and is only 251mm2 versus 495mm2 for the 14nm Vega.
Note, however, that the full Navi 10 configuration is used in only the Radeon RX 5700 XT, the Radeon RX 5700 that was also announced has a total of only 36 CUs enabled in the GPU. As for their specific speeds and feeds, the Radeon 5700 XT has abase GPU clock of 1605MHz, while the max boost clock is rated for 1905MHz. Slotting in between those goal posts is a Game Clock of 1755MHz -- AMD expects the real-world, actual clocks while gaming in a well ventilated system to fall somewhere in between the Game and Boost clocks. AMD is pairing the Radeon RX 5700 XT with 8GB of 14Gbps GDDR6 memory connected through a 256-bit bus which offers up 448GB/sec of bandwidth. When all is said and done, AMD claims that the card is capable of 9.75 TFLOPS of compute with a total board power of 225W. The Radeon RX 5700 has the same 8GB of GDDR6 memory on board. Base, Game, And Boost clocks are listed at 1465MHz, 1625MHz, and 1725MHz respectively, and it will top out at 7.95 TFLOPS of FP32 compute.
AMD has also redesigned the cooler, shroud, and power delivery circuitry on the Radeon RX 5700 XT. The reference card will feature and aluminum shroud and backplate, with an angled air-slot that reportedly helps optimize airflow when the card is idling or under a light load, to help keep the card quiet. There is a hefty vapor chamber with a dense array of heatsink fins that sits directly atop the GPU and RAM, that mates with the GPU through a graphite-based thermal interface. The is also a 7-phase all digital power circuit on the board to ensure ample power to the GPU. The cards feature 8-pin and 6-pin PCIe power feeds, which should provide ample headroom for overclocking. Between the PCIe slot and the supplemental feeds, 300W will be available to the Radeon RX 5700 series cards (150 + 75 + 75). We're also told that the current power/fan profile caps the cards at a max of 43dB, though we haven't been able to verify that on our own just yet.
AMD's new Radeon Encoding Engine offers support for new HDR/WCG HEVC encoding, 8K encoding (HEVC and VP), with about a 40% speed increase over the previous gen.
AMD, unfortunately, didn’t have much to say about ray tracing, however. The company doesn’t believe now is the time for ray tracing on consumer GPUs, though they expect to have some form of hardware acceleration in a next-generation GPU. AMD believes the cloud may be utilized for ray tracing, so all sorts of devices can benefit from the technology, but if you were hoping AMD would help push the current ecosystem forward, that doesn’t look to be the case at this point.
There are an array of additional features and more specifics about the Radeon RX 5700 series and AMD’s software that we'll cover next...