AMD Radeon R9 Fury X Review: Fiji And HBM Put To The Test

by Marco Chiappetta — Tuesday, June 30, 2015, 08:00 AM EDT

Page 3:
HBM (High Bandwidth Memory) Explained

The Fiji GPU's standout feature is its High Bandwidth Memory, or HBM. Over the past few months, AMD has publicly disclosed details regarding the revolutionary High Bandwidth Memory interface that's employed on the Fiji GPU, and potentially future APUs as well. Although we've covered this before, it's worth going over again here.

High Bandwidth Memory is designed to address a number of limitations of current GDDR5 memory implementations. With many of today’s graphics cards, a relatively large number of GDDR5 chips are necessary to offer sufficient capacity and bandwidth. The need to use a relatively large number of chips means a significant amount of PCB real estate is consumed by the memory and its associated voltage regulators. In addition, routing the traces required for a wide memory interface on a high-end graphics card significantly increases the complexity of the design. The power requirements for an off-chip interface are typically higher than an integrated solution as well.

Historically, shrinking and integrating functions onto the same piece of silicon has been used to address some of these issues. However, on-chip integration is not ideal for DRAM because it is not size or cost effective when utilizing a logic-optimized manufacturing process, like those employed for a GPU, CPU or SoC manufacturing.

In an effort to increase bandwidth, reduce real estate and power requirements, and bring the DRAM as close to possible to the logic die, (i.e. the GPU), AMD partnered up with a number of companies to help define a new specification and design new type of memory chip with low power consumption and an ultra-wide bus width. AMD worked with Hynix to originally develop the HBM standard and memory, which was eventually adopted by JEDEC (document number JESD235) in October 2013. AMD also worked to develop a key component of the technology, called the interposer, along with ASE, Amkor, and UMC. The interposer allows DRAM to be brought into close proximity with the GPU and simplifies communication and clocking, even at very wide bus widths.

HBM DRAM chips are stacked vertically, and “through-silicon vias” (TSVs) and “μbumps” are used to connect one DRAM chip to the next, and then to a logic die, and ultimately the interposer. TSVs and μbumps are also used to connect the SoC/GPU to the interposer and the entire assembly is connected onto the same package substrate. The end result is a single package on which the GPU/SoC and High Bandwidth Memory both reside.

Having everything on a single package has obvious real-estate benefits. 1GB of GDDR5 memory (comprised of four 256MB chips), requires roughly 672mm². HBM is not only smaller than typical GDDR5 memory, but because it’s vertically stacked, that same 1GB requires only about 35mm². According to AMD, 9900mm² of PCB footprint are required for an AMD Radeon R9 290X GPU and its associated memory, whereas less than 4900mm² is required for an upcoming HBM-based ASIC.

High Bandwidth Memory differs from GDDR5 in a few others ways as well. For example, the bus width on a HBM chip is 1024-bits wide, versus 32-bits on a GDDR5 chip. As a result, High Bandwidth Memory can, and likely needs to, be clocked much lower. Even at much lower clocks though, that wider memory bus and vertical stacking results in much more bandwidth—more than 100GB/s for an HBM stack versus 28GB/s with GDDR5. HBM also requires significantly less voltage, which equates to lower power consumption. All told, HBM offers much more bandwidth than traditional GDDR5 at roughly 50% less power. The implementation of HBM coming on the Fury X uses a 4-channel design that's limited to 4GB of memory. AMD has said, however, that future APIs like DX12 do much to better utilize graphics memory. And the company is tweaking its drivers to make more efficient use of graphics memory as well. The 4GB limitation will certainly rear its head at ultra high-resolutions when additional pixel processing is employed, but AMD thinks those situations won't matter much because the games most likely won't be playable at those settings anyway.

To summarize, High Bandwidth Memory will offer performance well beyond current GDDR5 (and DDR4), with increased power efficiency—roughly 3x the performance per watt of GDDR5, and in much smaller form factors. It's really HBM and its inherent ability to bring the memory onto the same package as the GPU that allowed AMD to make the Fury X so physically small. It's also what accounts for much of the power efficiency improvements we'll talk about a bit later.