AMD Vega GPU Architecture Details Revealed
AMD Vega GPU Architecture Details (Cont.)
The new Geometry Pipeline offers improved load balancing too, thanks in part to a new work distributor at the front end of the pipeline that more efficiently distributes work to the other elements in the pipeline. Traditionally, this stage in the pipeline would handle vertex processing, and then ultimately send work to the geometry shader. But with Vega, there is a new stage in the pipeline that has access like a compute shader, called a Primitive Shader. With the right information about a scene, the new Primitive Shader can eliminate more obscured primitives than previous-gen architectures, to improve efficiency, minimize memory use, and maximize available bandwidth.
Many games create scenes with a myriad of polygons that end up being obscured from view in the final render. AMD showed a particular scene from Deux Ex that was comprised of approximately 220M polygons, but in the end only 2M of them were actually visible. Traditional Z-Cull and Hierarchal Z-Buffer occlusion culling technologies do a good job of minimizing work on polygons that ultimately won’t be seen on-screen, but the primitive shader can reportedly improves things even further, when the right data is available to assess the visible polys in a scene.
There is also a New Compute Unit design in Vega. Vega’s NCU can handle 512 8-bit operations, 256 16-bit ops, or 128 32-bit ops per clock; the Double Precision rate is configurable. The NCU in Vega is optimized for higher clock speeds and also higher IPC. It features a larger instruction buffer versus previous-gen architectures and can do more operations per clock. Though actual clock speeds weren’t discussed, AMD is claiming improvements in both IPC and frequency with Vega.
To complement the NCU, Vega features a next-generation Pixel Engine as well. One of AMD’s goals with the new Pixel Engine is to further improve memory efficiency. The Pixel Engine features a new Draw Stream Binning Rasterizer, which reportedly improves performance and saves power. Smart primitive rasterization enabled by an on chip bin cache allows the new Pixel Engine to shade once, by culling pixels not visible in the final scene. It is sort of like a cache-aware scheduling of work that needs to be done by the GPU, that can also work out of order. Legacy architectures typically have non-coherent pixel and texture memory access, which means when they do things like render to texture and then perform shader operations, they have to access memory multiple times -- this shouldn't happen as often on Vega. With Vega, the render back ends are also now clients of the L2 cache, which can improve performance in applications that use deferred shading.
AMD will surely disclose more information regarding its Vega GPU architecture as we get closer to launch, but our early taste of the technology has left us optimistic. Vega’s performance targets need to be lofty to match NVIDIA’s high-end, Pascal-based products and do battle with whatever Team Green has coming down the pipeline in 2017, but it appears that AMD is arming itself nicely with Vega and has forward-looking new GPU on the way that addresses gaming and pro-graphics workloads.