Fermi's block-level diagram. The increased amount of configurable/L1 cache per SM and the 768K of unified L2 are obvious improvements over GT200, but NVIDIA has made changes to boost core execution efficiency all the way around.
For the moment, NVIDIA is talking about Fermi strictly as a scientific computing part, non-Tesla versions will come, of course, but they aren't the company's focus today. As for when those announcements will become reality, that's anyone's guess. Jen-Hsun refused to comment on when we might see Fermi cores ship beyond pointing to a Q4 2009/Q1 2010 timeframe. Fermi's evolution is a demonstration of how divergent AMD and NVIDIA's roadmaps have become. While AMD is staying focused in the consumer and workstation space, NVIDIA is adamant in its belief that scientific computing and major data set crunching (as well as consumer app acceleration) are the waves of the future. On paper, Fermi appears to be a strong competitor, but if it takes NVIDIA nine more months to push GeForce cards out the door, it could find itself matched against an even new series of Radeon cards, rather than the 5800 products currently on the market.
When we discussed NVIDIA's Tegra platform, we noted that the company's lack of a CPU design would undoubtedly impact its own Tegra product development. With Fermi, NVIDIA has built an architecture with some similar features to what you might expect to find on a massively parallel processor. In order to help developer's take full advantage of the process, NVIDIA has developed its own heterogeneous programming environment.