Nvidia Unveils New Details On 192-Core Mobile Kepler Part

Nvidia is showing off a bit more of the capabilities of its upcoming Kepler-based mobile GPU at SIGGRAPH this week, and the next-generation chip should be a quantum leap over what Nvidia is shipping currently. Then again, that's scarcely difficult -- Tegra 4 is based on 2005-era graphics hardware with some L2 cache and a few feature enhancements here and there. It doesn't support CUDA, DirectX 11, OpenCL, OpenGL ES 3.0, and it while it packs far more pixel shader and vertex pipelines than Tegra 3 offered, it was disappointing to those of us that hoped to see a Kepler-based design this generation.

Tegra 5 supposedly fixes all this. The next-generation core will offer cutting-edge support for all your favorite APIs, including standard OpenGL and the mobile-focused OpenGL ES. It packs 192 Kepler cores into a GPU implementation -- then packs that GPU into a 2W power envelope.

That's impressive. It's not clear how that compares to existing GPU implementations; power data for just the GPU inside a mobile part is hard to measure and will depend on process technology, clock speed, and implementation. It's possible that Nvidia, like Intel, chose to go with a wider core at a relatively low clock speed because that offered greater power savings compared to a high-clock, narrow core.

Two watts, while impressive, may be a sign that this core is going to have trouble in smartphones. Anandtech's comparison of GPU power consumption in 3D game tests early this year showed an average power consumption that ranged from 0.713W for Intel's Atom and a single-core SGX540 to 1.7W on Tegra 3 (Microsoft Surface) and 2.84W for Google's Nexus 10. Tegra 3 wasn't known for strong smartphone penetration, and the 2W figure may be an indication that this version of Tegra 5 will focus primarily on tablets as well.

Questionable Metrics:

The new chip looks great -- it's Nvidia's first truly modern graphics architecture for mobile devices. It's going to be a huge step up in compute capability and flexibility from Tegra 4. Qualcomm, ARM, and Imagination all have strong GPU tech, but Nvidia has the most modern architecture stack and the most robust implementation in the PC space. Tegra 5 should bring those capabilities to mobile parts. Just to make it clear -- I'm genuinely looking forward to this core.

Slides like this, however, misrepresent the fundamental performance jump the product is likely to deliver.

This slide is a straight FLOPS comparison; it implies that a single Kepler SMX hits about 400GFLOPS. That's true, if the chip is clocked at ~1GHz. That's very high for a 28nm chip on a low power process, but we know Kepler can hit that kind of clock speed on the desktop -- though that kind of max frequency might be a tablet-only capability.

The other problem with a straight FLOPS comparison is that even when accurate, it distorts the overall picture. The new mobile Kepler might outperform an 8800 GTX mathematically, but the memory bus (likely 2x32-bit) isn't going to provide anything like the bandwidth that NV's 384-bit bus did back in 2006.

Even if a 5x gain in GFLOPs doesn't translate into a 5x performance gain over the iPad 4 (and in real life, it typically doesn't), Kepler looks like it could redefine what mobile gaming systems are capable of. The new chip will be independently power-gated, and NV claims it can scale the design down to well below 2W to fit various usage scenarios. Current projected availability is first half 2014, with devices shipping in 12-14 months.