AMD Radeon RX 480 Review: Polaris Hitting The Sweet Spot
The Polaris GPU Explained
AMD's Polaris 10 GPU is comprised of approximately 5.7 billion transistors and has a die size of 232 mm2. The GPUs will be manufactured using Samsung’s or Global Foundries’ 14nm FinFET process, which results in significantly denser transistor count per square millimeter than previous-generation Radeons produced on TSMC’s 28nm process, and hence smaller die sizes, but also lower operating voltages for reduced power.
In addition to the new process technology, there are other enhancements incorporated into Polaris. The architecture has more powerful geometry processing capabilities, increased buffer sizes, more capable delta color compression, tweaked memory controllers, asynchronous compute with prioritization, specialized temporal scheduling, and support for AMD TrueAudio Next, among other features.
The enhanced geometry engines are significantly more capable than previous-generation offerings. They have a new Primitive Discard Accelerator that culls triangles with no area (or no inclusive sample points) early in the pipeline, to improve efficiency. There is also a new index cache to store small instanced geometry, to reduce the amount of data flowing through the pipeline.
Another Polaris key feature includes a 4th-generation version of AMD’s GCN (Graphics Core Next) architecture, with a number of improvements and new features. There’s the aforementioned Primitive Discard Acceleration baked-in, along with an improved hardware scheduler, better instruction prefetch, larger buffers, an optimized L2 cache, and increased shader efficiency.
There is enhanced lossless delta color compression technology built-into Polaris as well, which improves upon the compression tech introduced with AMD's Tonga GPU. 2, 4, and 8:1 compression ratios are supported now, in addition to an updated memory controller and PHY that support up to 8Gbps GDDR5 memory.
Updates to Asynchronous Compute are in Polaris as well and the command processor features a new quality-of-service (QoS) technique called a Quick Response Queue, which allows developers to designate a compute task queue as high-priority. Both high-priority and regular priority tasks can co-exist and share the GPU’s execution resources, but the Async Compute Engines dispatch workgroups from the high-priority tasks ahead of normal tasks. This prioritization ensures that high-priority tasks will use more resources and complete first, without the command processor context switching out other lower-priority tasks.