NVIDIA GF100 Architecture and Feature Preview - HotHardware

NVIDIA GF100 Architecture and Feature Preview

2 thumbs up

Back in late September of last year, NVIDIA disclosed some information regarding its next generation GPU architecture, codenamed "Fermi". At the time, actual product names and detailed specifications were not disclosed, nor was performance in 3D games, but high-level information about the architecture, its strong focus on compute performance, and broader compatibility with computational applications were discussed.

We covered much of the early information regarding
Fermi in this article. Just to recap some of the more pertinent details found there, the GPU codenamed Fermi will feature over 3 billion transistors and be produced using TSMC's 40nm processes. If you remember, AMD's RV870, which is used in the ATI Radeon HD 5870, is comprised of roughly 2.15 billion transistors and is also manufactured at 40nm. Fermi will be outfitted with more than double the number of cores as the current GT200, 512 in total. It will also offer 8x the peak double-precision compute performance as its predecessor, and Fermi will be the first GPU architecture to support ECC. ECC support will allow Fermi to compensate for soft error rate (SER) issues and also potentially allow it to scale to higher densities, mitigating the issue in larger designs.  The GPU will also be execute C++ code.



NVIDIA's Jen-Hsun Huang hold's GF100's closest sibling, Fermi-based Tesla card

During the GPU Technology conference that took place in San Jose, NVIDIA's CEO Jen-Hsun Huang showed off the first Fermi-based Tesla-branded prototype boards, and talked much of the compute performance of the architecture. Game performance wasn't a focus of Huang's speech, however, which led some to speculate that NVIDIA was forgetting about gamers with this generation of GPUs. That obviously is not the case, however. Fermi is going to be a powerful GPU after all. The simple fact of the matter is, NVIDIA is late with their next-gen GPU architecture and the company chose a different venue--the Consumer Electronic Show--to discuss Fermi's gaming oriented features.


GF100 High-Level Block Diagram


With desktop oriented parts, Fermi-based GPUs will here on in be referred to as GF100. As we've mentioned in previous articles, GF100 is a significant architectural change from previous GPU architectures. Initial information focused mostly on the compute side, but today we can finally discuss some of the more consumer-centric details that gamers will be most interested in.

At the Consumer Electronics Show, NVIDIA showed of a number of
GF100 configurations, including single-card, and 2-way and 3-way SLI setups in demo systems. Those demos, however, used pre-production boards that were not indicative of retail product. Due to this fact, and also because the company is obviously still working on feverishly on the product, NVIDIA chose NOT to disclose many specific features or speeds and feeds of GF100. Instead, we have more architectural details and information regarding some new IQ modes and geometry related enhancements.

In the block diagram above, the first major changes made to GF100 become evident. In each GPC cluster--there are four in the diagram--newly designed Raster and Polymorph Engines are present. We'll give some more detail on these GPU segments a little later, but having these engines present in each GPC segment essentially allows each one to function as a full GPU. The design was implemented to allow for better geometry performance scalability, through a parallel implementation of geometry processing units. According to NVIDIA, the end result in an 8X improvement in geometry performance over the GT200. Segmenting the GPU in this way also allows for multiple levels of scalability, either at the GPC or individual SM unit level, etc.

Each GF100 GPU features 512 CUDA cores, 16 geometry units, 4 raster units, 64 texture units, 48 ROPs, and a 384-bit GDDR5 memory interface. If you're keeping count, the GT200 features 240 CUDA cores, 42 ROPs, and 60 texture units. The geometry and raster units, as they are implemented in GF100, are not in the GT200 GPU. The GT200 also features a wider 512-bit memory interface, but the need for such a wide interface is somewhat negated in GF100 in that the GPU uses GDDR5 memory which effectively offers double the bandwidth of GDDR3, clock for clock.

If we drill down a little deeper, each SM core in each GPC is comprised of 32 CUDA cores, with 48/16KB of shared memory (3 x that of GT200), 16/48KB of L1 (there is no L1 cache on GT200), 4 texture units, and 1 PolyMorph Engine. In addition to the actual units, we should point out that improvements have also been made over the previous generation for 32-bit integer operations performance and for full IEEE-754 2008 FMA support. The increase in cache size and the addition of L1 cache were designed to keep as much data on the GPU die as possible, without having to access memory.

The L1 cache is used for register spilling, stack ops, and global loads and stores, while the L2 cache is for vertex, SM, texture, and ROP data. According to NVIDIA, the GF100's cache structure offers many benefits over GT200 in gaming applications, including faster texture filtering and more efficient processing of physics and ray tracing, in addition to greater texture coverage and generally better overall compute performance.

The PolyMorph and Raster Engines in the GPU perform very different tasks, but in the end result in greater parallelism in the GPU. The PolyMorph Engines are used for world space processing, while the Raster Engines are for screen space processing. There are a total of 16 polymorph engines placed before each SM. They allow work to be distributed across the chip, but there is also intelligent logic in place designed to keep the data in order. Communications happen between the units to ensure the data arrives in DRAM in the correct order and all of the data is kept on die, thanks to the chip's cache structure. Synchronization is handled at the thread scheduling level. The four independent Raster Engines serve the geometry shaders running in each GPC and the cache architecture is used to pass data from stage to stage in the pipeline. We're also told that the GF100 offers 10x faster context switching over the GT200, which further enhances performance when compute and graphics modes are both being utilized.

Article Index:

Prev 1 2 3 4 5 Next
0
+ -

Ahh, thanks. But the hardware though is not unique to Nvidia, unlike PhysX which runs off CUDA-enabled GeForce CPU. The hair and water tessellation is possible with any DirectX 11 compatible card. I guess that's the point that I was making.

About the hair vids, I was thinking the 25 FPS was very low as well, but we this kind of detail won't be present in games, not yet anyway. Something even 10% of that would be a huge improvement in realism in games.

0
+ -

Nice post 3vi1 I really like the openGL video. And on your "gibbersome" gaming comment yeah that kind of detail would be flat out bad in a game. Thats also another reason I was pointing out the hair thing, can you imagine playing or making a charachter as detailed as that in a multiplayer game that would be sick, but I am sure we'll see it within the next year or two. The next one I am waiting for is SWTOR (Star wars the old republic) which has finally been given a release date of spring 2011. If anyone likes MMO type games check it out  http://www.swtor.com/

0
+ -

I am waiting for SWtOR too, I am guessing it gets pushed into the Summer to Early Fall of 2011. I think the end product will be worth the wait though.

0
+ -

As for the video card thing Nvidia has put this off to long I think. Of course as far as I remember it has actually been over a year since the actually released anything that was not a re-branded (shrunken maybe but same architecture except memory changes) and truly new 100%. Before this it was about every 8-9 months for both companies and would be at least a major update. They have released the 295 but it's just dual gpu the hardware is not super new (maybe Tweaked). Either way as ATI has also known I would imagine considerably more than any of us about what they intended to do for most likely over a year they should be prepared. So I would think we would see something major, not that the 5800 up line is not at least concerning current hardware. However; if you had been in a major back and forth with a competitor, and for some reason they were slowed down for any reason, and you new the basic time line for the slow down as well as what the intended to put out next what would you do?

 

0
+ -

As for SwtOR I imagine it will be late spring early summer (I say June first 2 weeks of July). As I had posted on the forum I appreciate there strategy with the release as anything else released in the MMO world even WOW usually takes some time to get straightened out with all foretold features active and running well. I personally am not a WOW player really I tried it for my free month and had like 3 20 level characters. I started in the second week of EQ1, and have been on every beta from original release until Dragons (about 10 straight), and a GM on 3 servers. The graphics are just to low quality for me, not to mention I played for a very, very long time.

For some reason the refer to players like me as hardcore at least in MMO's. I also played Vanguard beta and for quite some time  afterward graphically it is better than anything including Conan (which I also beta'd, I have beta tested almost every major successful MMO release since EQ1 even WOW, but it would never work at first, neither would AOC for that matter DND online just was a joke, Warhammer beat was kind of cool but after a couple weeks following release I was bored, Heros was cool I did not really like Villians much, well enough listing of beta's trust me I have probably beta'd at least 30 MMO's all together some not major releases) Either way VG has pretty much had it's death warrant signed for now as they have not gone past initial release other than some added quests for higher levels but not enough, stability work which was done after 3 months, (Believe it or not VG on release was very demanding graphically and basically would not play right with anything under a 1900+ series ATI or a High end Nvidia card, kind of like Crysis, you also needed as much ram as you could get and a high class CPU as well as a fast HD just for general play at low settings) basically no new abilities a +5 level cap it's a joke.

Either away enough of my wisdom on MMO's except to state that I believe and hope it is true that SWtOR is complete when finished which is actually very rare, and stable on release. Then they can worry about expansion in the normal 1 to 2 year time span, and know BIO I think it will be close to yearly which is more inline with something like WOW where the company (Blizzard Oh yeah I beta'd Diablo and Diablo 2 as well my first betas) actually wants there game to be a success and pushes it.

The thing with Sony online makes no sense to me really as a successful and regularly updated content and technology wise game is nothing but a bank in itself. Your customer base buys the original usually within a week to a month of release, pays you monthly access fees and buys every upgrade you put out for it. Money wise they are the most successful because of the fat loot they make after release which is non stop and recurring.

0
+ -

I know many people are eagerly waiting for SwtOR. Also, if you're an Elder Scrolls fan, it's been leaked that they're developing an MMO as well. After the success of WOW, MMORPG's really seem like all the rage with large game developers. Personally, I'm more of a single RPG guy. I like my games to have a definite beginning and end to them. Baldur's Gate was my first and I've been hooked ever since.

Back to Nvidia, yeah, I agree they may have waited too long. If ATI's supply problems resolve later this month, many more 5xxx series cards will be sold in the upcoming weeks, especially with a moderate price drop with the increased supply. Nvidia recognizes this, and that's why you saw the push to release the lower-mid GT300 series mobile cards. They stood to lose out on the mobile market and may have even tried to undercut ATI.

Again, if Fermi is the game changer that Nvidia has been hyping it up to be then it won't matter in the end. Right now though I get the feeling that Nvidia bit off more than it could chew.

0
+ -

You are right, that hair demo is pretty impressive too.

0
+ -

I wonder how much of that is going to translate into actual gaming. Remember that hair demo close up was pulling in 25 FPS. While we may not see hair that detailed in games anytime soon, anything one-tenth as good would look awesome.

0
+ -

@Anakhoresis

I agree with everything you said with gaming.

As for double floating point precision, this feature could be a big deal for GPU computation projects that use CUDA like GPUID. This feature could save projects wasted time double checking calculations as the GPU may catch the alot of the miscalculations itself.

0
+ -

ATI also has double precision floating support in their 5800 series line up.

Interestingly the 5700 series doesn't, but if you go back to two years ago, the 4770 had it. I'm guessing this was a cost cutting measure for ATI.

 

@Quinid Thanks for answering.

Prev 1 2 3 4 5 Next
Login or Register to Comment
Post a Comment
Username:   Password: