Logo   Banner   TopRight
GPU Tech: NVIDIA Talks Fermi, Unveils Nexus
Date: Oct 02, 2009
Author: Joel Hruska
Fermi 101
If you've followed the early announcements concerning Fermi, NVDIA's next-generation GPU architecture, you should already be aware that the new GPU core is both an evolution of the existing GT200 architecture and a significant new design in its own right. NVIDIA made it clear early on that they weren't going to be talking about GeForce products at the conference this year, but instead have discussed Fermi as a Tesla successor and future high-end engine primed to drive the GPGPU industry.

So that's 16 times 32...carry the four...

While it carries many of the same features as the GT200 series, Fermi is distinctly its own animal. NVIDIA's Fermi whitepaper describes the new architecture as follows: "G80 was our initial vision of what a unified graphics and computing parallel processor should look like. GT200 extended the performance and functionality of G80. With Fermi, we have taken all we have learned from the two prior processors and all the applications that were written for them, and employed a completely new approach to design to create the world’s first computational GPU."

"Computational GPU" is short-hand for "a whole lot of number crunching". Where NVIDIA's G80 packed 128 cores and the GT200 raised the bar to 240, a full-scale Fermi implementation will pack 512 processor cores, ECC memory protection, and up to 8x the double-precision floating point throughput of its predecessor. Peak number-crunching power has increased all the way around; Fermi can execute 64-bit FP code at 50% the speed of 32-bit FP code, as compared to 12.5 percent the speed of 32-bit FP in earlier product iterations.

Each SM (streaming multiprocessor) in Fermi (there are 16 total) has access to 64K of configurable L1 cache; the entire chip shares a 768K L2 cache. In aggregate, that's about 1.8MB of cache, significantly more than the GT200 architecture, which offered 16K of managed memory per SM.
Fermi, Continued
Other features of Fermi include support for C++ (current-generation CUDA products only support C), and, of course, the already oft-repeated fact that this core is some three billion transistors in size. NVIDIA has publicly tried to blow the importance of this off, claiming that analysts have always expressed concerns over the size of the company's chips, but there's no arguing that three billion transistors is a lot. Typically speaking, the more transistors in a product, the greater the chance something will go wrong when fabbing it; NVIDIA is taking something of a risk in building Fermi on a monolithic core instead of aiming for a mid-range, mid-size core and dual-GPU configurations ala AMD.

Fermi's block-level diagram. The increased amount of configurable/L1 cache per SM and the 768K of unified L2 are obvious improvements over GT200, but NVIDIA has made changes to boost core execution efficiency all the way around.

Dig into NVIDIA's whitepapers on Fermi, and you may end up thinking that the company designed a compute engine that happens to be capable of handling graphics rather than the other way around. Many of Fermi's changes should translate across GPU computation and gaming; there's no inherent reason why both sides can't benefit from certain improvements. Certain features, like support for 64-bit addressing, however, are rather obviously aimed at the scientific computing market rather than the needs of the game industry.

For the moment, NVIDIA is talking about Fermi strictly as a scientific computing part, non-Tesla versions will come, of course, but they aren't the company's focus today. As for when those announcements will become reality, that's anyone's guess. Jen-Hsun refused to comment on when we might see Fermi cores ship beyond pointing to a Q4 2009/Q1 2010 timeframe. Fermi's evolution is a demonstration of how divergent AMD and NVIDIA's roadmaps have become. While AMD is staying focused in the consumer and workstation space, NVIDIA is adamant in its belief that scientific computing and major data set crunching (as well as consumer app acceleration) are the waves of the future. On paper, Fermi appears to be a strong competitor, but if it takes NVIDIA nine more months to push GeForce cards out the door, it could find itself matched against an even new series of Radeon cards, rather than the 5800 products currently on the market.

When we discussed NVIDIA's
Tegra platform, we noted that the company's lack of a CPU design would undoubtedly impact its own Tegra product development. With Fermi, NVIDIA has built an architecture with some similar features to what you might expect to find on a massively parallel processor. In order to help developer's take full advantage of the process, NVIDIA has developed its own heterogeneous programming environment.
NVIDIA's Nexus, Conclusion

Earlier this year, we speculated that we'd see future versions of Tegra include CUDA support. NVIDIA didn't have a timeframe to share with us yet, but the company did confirm that CUDA capability would be built into future versions of Tegra, and that the ability to run certain types of applications extremely efficiently on the GPU is part of its long-term competitive strategy.


Click Image For Larger View

One of the major projects NVIDIA has revealed at the conference thus far is Nexus, a massively parallel development environment that plugs into Microsoft's Visual Studio. Nexus, according to NVIDIA, will allow programmers to simultaneously develop for heterogeneous environments. Developers will be able to use Nexus to write code intended for execution on the GPU or CPU simultaneously, and includes debugger and profiler capabilities to identify which code runs best on which execution resources.

According to NVIDIA, Nexus is capable of hardware-level debugging CUDA C, HLSL, and DirectCompute (the original G80 did not include a hardware-level debugger; this feature is only available on G84 cards and above). When profiling program execution, it's possible to view GPU and CPU events simultaneously, or drill down into a specific area. If you listen to NVIDIA, the company is quite excited about Nexus, and touts it as a major boon to developers who have long wanted such a programming interface.

"NVIDIA Nexus is going to improve programmer productivity immediately," said Tarek El Dokor at Edge 3 Technologies. "An integrated GPU and CPU development solution is something Edge 3 has needed for a long time. The fact that it’s integrated into the Visual Studio development environment drastically reduces the learning curve."

Tegra doesn't hook into NVIDIA's plans for Fermi at the moment, but the more efficient an architecture the company can build, the less it needs to rely on the strength of a more conventional CPU, x86-compatible or not. We're potentially at least two-to-three generations away from a point where NVIDIA might attempt to combine conventional processing with GPU capabilities on a single die, but if Intel and AMD can do it from the CPU side, NVIDIA could possibly pull an equivalent trick starting in the opposite corner.

NVIDIA has gone out of its way to showcase some of the cooler and more interesting software and hardware developments it has coming down the pipeline. The company had a handful of Fermi cards either in-use or on display, but there's very little chance that we'll see the card in 2009; a Q1/Q2 launch is far more likely. NVIDIA also wasn't willing to discuss launch speeds, whether or not desktop cards would be full versions of the architecture with all cores enabled, or how the card would perform against then-current products from ATI.

If it leaves the gate in top condition, Fermi will offer developers a far more advanced and capable platform than either G80 or GT200 ever did.

Content Property of HotHardware.com