Vivante: Challenging the Status Quo In Mobile GPUs

Article Index

GPU Architecture

Vivante has taken a different approach to core design from most of the other companies that play in this space. All modern GPUs are explicitly designed to be modular and scalable, from smartphone hardware to workstation implementations. Typically what that means is that a company like Nvidia or AMD defines a single compute unit that can be duplicated throughout the GPU design. For the Radeon GCN architecture, for example, a compute unit is a group of 64 stream processors, a pair of asynchronous command engines, and a set of render outputs (ROPs) and texture mapping units (TMUs).

Vivante's GPUs are modular as well, but with a much finer level of granularity.

Each of the three shaded blocks (3-D Pipeline, Vector Graphics Pipeline, 2-D Pipeline) can be segmented or stacked into various configurations. A GPU core, in other words, could contain more ultra-threaded shaders, or additional vector graphics engines, up to 32 cores in total. Since the number of graphics front ends can vary depending on how many shader cores are hooked to each graphics core, the counts themselves can get rather confusing. The GC1000 graphics processor we'll be discussing today can be built in two configurations -- 2 (VEC-4), or 8 (VEC-1). The first configuration uses two GPU front-ends with 4 shader cores per GPU block, while the second has eight GPU front-ends with a single shader core in each. Different core configurations can be fine-tuned for maximum efficiency depending on workload.

This kind of fine-grained approach is fundamentally different from what we've seen from other manufacturers, who tend to balance pixel, shader, and other resources in one of two ways. Simple architectures, based on older GPU technology like Tegra 4 and its predecessors, you partition in advance for a fixed number of pixel and vertex shaders and hope you get your balance right. Unified GPUs, like the mobile flavor of Kepler that'll debut next year, can be programmed for multiple tasks and allocate their resources accordingly. Vivante GPUs use a unified shader architecture and they're more granular -- which means manufacturers can eat their cake and have it too when it comes to allocating GPU resources.

Each shader core contains 16 registers that can be ganged together depending on workload. A shader can perform up to five double-precision operations per cycle per shader unit quantum. There are up to 16 shader cores per GPU core, and up to four GPU cores in a single implementation, though no one has built a Vivante core anywhere near that large at this point.

One of the advantages of this tiny, modular architecture is that you can clock the cores like gangbusters. According to Vivante, the 28nm high performance silicon variant of the Vivante architecture can clock up to 1GHz at full speed, but fall back to 1/64th of this in power saving mode, or roughly 16MHz.

Related content