NVIDIA GF100 Architecture and Feature Preview - HotHardware

NVIDIA GF100 Architecture and Feature Preview

2 thumbs up

Back in late September of last year, NVIDIA disclosed some information regarding its next generation GPU architecture, codenamed "Fermi". At the time, actual product names and detailed specifications were not disclosed, nor was performance in 3D games, but high-level information about the architecture, its strong focus on compute performance, and broader compatibility with computational applications were discussed.

We covered much of the early information regarding
Fermi in this article. Just to recap some of the more pertinent details found there, the GPU codenamed Fermi will feature over 3 billion transistors and be produced using TSMC's 40nm processes. If you remember, AMD's RV870, which is used in the ATI Radeon HD 5870, is comprised of roughly 2.15 billion transistors and is also manufactured at 40nm. Fermi will be outfitted with more than double the number of cores as the current GT200, 512 in total. It will also offer 8x the peak double-precision compute performance as its predecessor, and Fermi will be the first GPU architecture to support ECC. ECC support will allow Fermi to compensate for soft error rate (SER) issues and also potentially allow it to scale to higher densities, mitigating the issue in larger designs.  The GPU will also be execute C++ code.



NVIDIA's Jen-Hsun Huang hold's GF100's closest sibling, Fermi-based Tesla card

During the GPU Technology conference that took place in San Jose, NVIDIA's CEO Jen-Hsun Huang showed off the first Fermi-based Tesla-branded prototype boards, and talked much of the compute performance of the architecture. Game performance wasn't a focus of Huang's speech, however, which led some to speculate that NVIDIA was forgetting about gamers with this generation of GPUs. That obviously is not the case, however. Fermi is going to be a powerful GPU after all. The simple fact of the matter is, NVIDIA is late with their next-gen GPU architecture and the company chose a different venue--the Consumer Electronic Show--to discuss Fermi's gaming oriented features.


GF100 High-Level Block Diagram


With desktop oriented parts, Fermi-based GPUs will here on in be referred to as GF100. As we've mentioned in previous articles, GF100 is a significant architectural change from previous GPU architectures. Initial information focused mostly on the compute side, but today we can finally discuss some of the more consumer-centric details that gamers will be most interested in.

At the Consumer Electronics Show, NVIDIA showed of a number of
GF100 configurations, including single-card, and 2-way and 3-way SLI setups in demo systems. Those demos, however, used pre-production boards that were not indicative of retail product. Due to this fact, and also because the company is obviously still working on feverishly on the product, NVIDIA chose NOT to disclose many specific features or speeds and feeds of GF100. Instead, we have more architectural details and information regarding some new IQ modes and geometry related enhancements.

In the block diagram above, the first major changes made to GF100 become evident. In each GPC cluster--there are four in the diagram--newly designed Raster and Polymorph Engines are present. We'll give some more detail on these GPU segments a little later, but having these engines present in each GPC segment essentially allows each one to function as a full GPU. The design was implemented to allow for better geometry performance scalability, through a parallel implementation of geometry processing units. According to NVIDIA, the end result in an 8X improvement in geometry performance over the GT200. Segmenting the GPU in this way also allows for multiple levels of scalability, either at the GPC or individual SM unit level, etc.

Each GF100 GPU features 512 CUDA cores, 16 geometry units, 4 raster units, 64 texture units, 48 ROPs, and a 384-bit GDDR5 memory interface. If you're keeping count, the GT200 features 240 CUDA cores, 42 ROPs, and 60 texture units. The geometry and raster units, as they are implemented in GF100, are not in the GT200 GPU. The GT200 also features a wider 512-bit memory interface, but the need for such a wide interface is somewhat negated in GF100 in that the GPU uses GDDR5 memory which effectively offers double the bandwidth of GDDR3, clock for clock.

If we drill down a little deeper, each SM core in each GPC is comprised of 32 CUDA cores, with 48/16KB of shared memory (3 x that of GT200), 16/48KB of L1 (there is no L1 cache on GT200), 4 texture units, and 1 PolyMorph Engine. In addition to the actual units, we should point out that improvements have also been made over the previous generation for 32-bit integer operations performance and for full IEEE-754 2008 FMA support. The increase in cache size and the addition of L1 cache were designed to keep as much data on the GPU die as possible, without having to access memory.

The L1 cache is used for register spilling, stack ops, and global loads and stores, while the L2 cache is for vertex, SM, texture, and ROP data. According to NVIDIA, the GF100's cache structure offers many benefits over GT200 in gaming applications, including faster texture filtering and more efficient processing of physics and ray tracing, in addition to greater texture coverage and generally better overall compute performance.

The PolyMorph and Raster Engines in the GPU perform very different tasks, but in the end result in greater parallelism in the GPU. The PolyMorph Engines are used for world space processing, while the Raster Engines are for screen space processing. There are a total of 16 polymorph engines placed before each SM. They allow work to be distributed across the chip, but there is also intelligent logic in place designed to keep the data in order. Communications happen between the units to ensure the data arrives in DRAM in the correct order and all of the data is kept on die, thanks to the chip's cache structure. Synchronization is handled at the thread scheduling level. The four independent Raster Engines serve the geometry shaders running in each GPC and the cache architecture is used to pass data from stage to stage in the pipeline. We're also told that the GF100 offers 10x faster context switching over the GT200, which further enhances performance when compute and graphics modes are both being utilized.

Article Index:

1 2 3 4 5 Next
0
+ -

From listening to people theoretically more knowledgeable about hardware than I (which really would not be that difficult, to be honest. I'm more into the practical information than the technical information, e.g. This card goes in that slot), I've heard that the boost in double floating point precision is something that is pretty much not utilized (if it's even possible to be) in games, and so it's nothing that will help frame rates/gaming performance, yet is something that is built into the architecture, so it's something that can't just be cut for, say, the Geforce series of Fermi (if they continue that line), to make them cheaper.

Basically, it sounded like the cards will have a large piece of them on there, that will be paid for by the consumer, that won't actually be used by games at all. Something that just raises costs with no benefit for an average gamer that buys one. Could anyone shed light on this?

0
+ -

A lot of technical information, but it's also showcasing some of the things the Nvidia DirectX 11 enabled cards will be able to do. The free-flowing hair and water look incredible.

The higher anti-aliasing modes, ray tracing, tessellation, Nvidia is showing how much more powerful Fermi is than the GT200 series. And I think we're talking multiples, at least 2-3 times the performance in certain areas.

Hard numbers will bear that out, but it's safe to say Nvidia has something very powerful up their sleeve.

0
+ -

When you look through the full article and the slides the show check out the one of the hair. I studied that one pretty deep, and it looks considerably similar to the real thing. With something like hair they are minuscule to the point of blending together. In the Nvidia demo of hair you could see thousands of separate hairs in the image. So the detail and construction level of this card looks to be awesome. However; we will have to see how that affects speed of rendering etc for a final verdict.

0
+ -

The hair and the water pics, both look amazing.

Though I thought it was showing off DirectX 11 tessellation, rather than a feature of the Nvidia cards.

The Supersonic Sled demo would be unique to Nvidia because it employs PhysX.

0
+ -

That's true but when I looked at that picture first I was like why are the showing a blond wig on here. Then I looked at it closer and scrolled down and read to details and was like wow that almost looks like rl hair. So producing a pic of that much detail that I can see through my current GPU on a webpage image which is not the same GPU is like minus 2-400 percent detail wise at the least.

0
+ -

I see what you mean. Man, but I would loved to see a video demonstration of the free flowing hair and the water. With the speed of news coming out about Fermi, I think we'll have a demo pretty soon!

0
+ -

Well what I am saying is that the picture you can see is awesome and the real picture on your PC would be 2-400 times better. So this thing will blow away everything on the market I would imagine, but it also changes the general functionality of a GPU as well. The focus and delivery mechanisms as well as software platform is in many ways totally different, or at least the focus is. I am pretty confident the reason the 5970 is two tweaked 5850 gpu's, is because ATI is working on something new as well. I also think that inn realistic pictures we are on a cusp. Look at Avatar it is animation done by computers almost completely with real actors at the same time. It is a meshing of technologies which I see on your PC in a relatively short amount of time. The 5870 started it this Nvidia hardware expands it, and ATi expands it just like normal. The impact on digital imagery and its availability to the normal person will change though.

0
+ -

I was able to find some video demos on youtube and I posted them a couple of posts above. They're worth checking out! My current graphics card would melt if I tried running any of the demos on it, lol.

The Streaming Multiprocessors on the GF100 have taken a giant leap forward:

  • 32 CUDA Cores (4x compared to GT200)
  • 16 or 48KB of Shared Memory (3x compared to GT200)
  • 16 or 48KB of L1 Cache (There was no L1 on GT200)

We're seeing some major increases in hardware power and we're also seeing real improvements in geometric processing (tessellation and displacement mapping). Rob over at Techgage mentions it in his review:

"While pixel shaders have had an increasing focus from GPU generation to the next, there's been almost no love to the triangle generator. Compared to the GeForce FX (2003), the shading horsepower has increased by 150x, while the geometric processing has increased by only 3x."

 

You're right that the new Nvidia cards will surpass the offerings by ATI, and Nvidia has not tried to hide that fact. Look at this graph they released of tessellation performance(red is ATI):

The 5870's max FPS barely touches the min of the GF100.

0
+ -

yeah those video demos are awesome especially the hair one and the water. I still think the detail in the hair is awesome. when the wind blows it looks real

 

0
+ -

gibbersome:
Though I thought it was showing off DirectX 11 tessellation, rather than a feature of the Nvidia cards.

Tessellation is a feature of the card, not DirectX.  The DirectX 11 standard specifies which adds APIs to control it, and specifies that only hardware that supports it can be called "DirectX 11 compatible".  OpenGL3.2 supports the same tessellation with the same cards, even on Windows XP (and Linux).

 

Tessellation controlled by OpenGL:

[View:http://www.youtube.com/watch?v=C8TKUlMzcbw&feature=channel]

 

I agree about the hair and water vids - really nice looking stuff, though the frame rates on the hair seem a bit troubling considering there's nothing else being rendered in the demo.

1 2 3 4 5 Next
Login or Register to Comment
Post a Comment
Username:   Password: