Vivante: Challenging the Status Quo In Mobile GPUs
GPU Architecture
Vivante has taken a different approach to core design from most of the other companies that play in this space. All modern GPUs are explicitly designed to be modular and scalable, from smartphone hardware to workstation implementations. Typically what that means is that a company like Nvidia or AMD defines a single compute unit that can be duplicated throughout the GPU design. For the Radeon GCN architecture, for example, a compute unit is a group of 64 stream processors, a pair of asynchronous command engines, and a set of render outputs (ROPs) and texture mapping units (TMUs).
Vivante's GPUs are modular as well, but with a much finer level of granularity.
This kind of fine-grained approach is fundamentally different from what we've seen from other manufacturers, who tend to balance pixel, shader, and other resources in one of two ways. Simple architectures, based on older GPU technology like Tegra 4 and its predecessors, you partition in advance for a fixed number of pixel and vertex shaders and hope you get your balance right. Unified GPUs, like the mobile flavor of Kepler that'll debut next year, can be programmed for multiple tasks and allocate their resources accordingly. Vivante GPUs use a unified shader architecture and they're more granular -- which means manufacturers can eat their cake and have it too when it comes to allocating GPU resources.
Each shader core contains 16 registers that can be ganged together depending on workload. A shader can perform up to five double-precision operations per cycle per shader unit quantum. There are up to 16 shader cores per GPU core, and up to four GPU cores in a single implementation, though no one has built a Vivante core anywhere near that large at this point.
One of the advantages of this tiny, modular architecture is that you can clock the cores like gangbusters. According to Vivante, the 28nm high performance silicon variant of the Vivante architecture can clock up to 1GHz at full speed, but fall back to 1/64th of this in power saving mode, or roughly 16MHz.