Logo   Banner   TopRight
TopUnder
Transparent
NVIDIA GeForce GTX 480: GF100 Has Landed
Transparent
Date: Mar 26, 2010
Section:Graphics/Sound
Author: Marco Chiappetta
Transparent
Introduction and Related Information

 

For better or worse, the launch of NVIDIA's next-generation GPU architecture codenamed Fermi, a.k.a. GF100, is one of the most highly anticipated in our industry, ever. Information about the GPU has been tricking out for many months now, some of it good and some bad. Regardless of what you have chosen to believe or ignore up to this point, one irrefutable fact remains. NVIDIA is extremely late to the DirectX-11 party. There are no ifs, ands, or buts about it. Rival AMD has used the last few months to release a myriad of DX11-class cards ranging in price from under $100 to almost $700, fleshing out a top-to-bottom line-up that caters to virtually every market segment. Today NVIDIA is announcing two high-end cards, neither of which will be available for a couple of more weeks. So while this announcement is an important move for the company, NVIDIA would have liked to have made it sooner. C'est la vie.

NVIDIA may be late with their DX11-class cards, but launching strong products that compete favorably at their respective price points may erase some lingering concerns about the company and restore faith in prospective consumers. To that end, we can finally show you what NVIDIA has in store for the hardcore gamers out there. Today, NVIDIA is officially unveiling the GeForce GTX 480 and GeForce GTX 470. We have two of the flagship GeForce GTX 480 cards in house, and have tested them alongside NVIDIA's previous-gen products and AMD's Radeon HD 5800 / 5900 series, both in single and dual-card configurations. There's a lot to cover, so grab a snack, hydrate, and strap yourself in while we take NVIDIA's latest flagship for a spin around the HotHardware lab...

NVIDIA GeForce GTX 480 & 470
Specifications and Features




NVIDIA is announcing two DirectX-11 class cards today based on the GF100 GPU, the GeForce GTX 480 and the GeForce GTX 470. Each card's respective specifications and features are listed in the chart above, but we have more comprehensive explanations of the features and technology employed in the cards on the pages ahead. If, however, you'd like to brush up on some previous articles dealing with NVIDIA's products, we'd recommend taking a gander at some of the articles listed below...
 

Our GeForce 8800 GTX launch article goes in depth on the G80 GPU architecture and explains NVIDIA's CUDA GPGPU technology and the GeForce GTX 280 coverage goes in depth on the previous-gen GT200 GPU.  Also, our GeForce 8800 GT, 8800 GTS 512MB, 9800 GTX and GX2 pieces encompass the majority of our G92 GPU coverage.

Transparent
The NVIDIA GeForce GTX 480

 

Here it is folks; the moment many of you have been waiting for. The official unveiling of NVIDIA's next-gen flagship GPU, the GeForce GTX 480. The card you see pictured below is based on NVIDIA's own reference design, but expect all of their partner boards to look identical, save for some custom decals for the foreseeable future...

   

   

   
NVIDIA GeForce GTX 480 Reference Card

Somewhat surprisingly, the GeForce GTX 480 doesn't look much different than current GeForce GTX 200 series cards, due to the shells surrounding the units. Although a large portion of the GTX 480's heatsink is exposed, as are it's heatpipes, which we think looks really cool. The reference card picutred here has a GPU clock of 700MHz, with a Stream Processor clock of 1401MHz. 480 of 512 stream processor / CUDA cores in the GF100 GPU are enabled on the GTX 480, and the card sports a 1.536GB frame buffer consisting of GDDR5 memory clocked at 924MHz, for an effective data rate of 3.696GHz. The memory is connected via a 384-bit memory interface and the GPU has 60 texture units and 48 ROPs. This configuration offers a peak texture fillrate of a 42GTexels/s and over 177GB/s of memory bandwidth.

There are two dual-link DVI outputs on the card, along with a mini HDMI output with audio. The GeForce GTX 280 required both a 6-pin and an 8-pin PCI Express power connector and max board power hovers around 250 watts. More information about the GF100 GPU itself and the card's acoustic and thermal characteristics are available in the pages ahead. We thought since this card has been so hotly anticipated, that we would let you all get up close and personal from the get go.


The NVIDIA GeForce GTX 470 Reference Card

Also coming down the pipeline is the GeForce GTX 470. In terms of features and capabilities, the GTX 470 and 480 are identical. The GTX 470, however, has fewer stream processors / CUDA cores enabled, 448 to be exact, it has a smaller memory interface and its clock speeds are a bit lower. The GeForce GTX 470 has a 320-bit memory interface with 1.28GB of frame buffer memory running at 837MHz (3.348GHz effective). Its GPU is clocked at 607MHz and has 56 texture units and 40 ROPs available. We do not have a GeForce GTX 470 in hand just yet, however, so we won't be showcasing its performance here.  We'll just have to save that for another article.

Transparent
NVIDIA GF100 Architecture

The Fermi GPU architecture, that is the foundation of the GF100 GPU powering the GeForce GTX 480, features over 3 billion transistors and is produced using TSMC's 40nm processes. If you remember, AMD's RV870, which is used in the ATI Radeon HD 5870, is comprised of roughly 2.15 billion transistors and is also manufactured at 40nm. Fermi will be outfitted with more than double the number of cores as the current GT200, 512 in total, but as it is implemented on the GeForce GTX 480 only 480 shader processors are exposed. The GPU will offer 8x the peak double-precision compute performance as its predecessor, and Fermi will be the first GPU architecture to support ECC. ECC support will allow Fermi to compensate for soft error rate (SER) issues that can be problematic in larger bleeding-edge IC designs and also potentially allow it to scale to higher densities. The GPU will also be execute C++ code. 


GF100 High-Level Block Diagram

The GF100 is a significant architectural change from previous GPU architectures. In the block diagram above, the first major changes made to GF100 become evident. In each GPC cluster--there are four in the diagram--newly designed Raster and Polymorph Engines are present. We'll provide more detail on these GPU segments a little later, but having these engines present in each GPC segment essentially allows each one to function as a full GPU. The design was implemented to allow for better geometry performance scalability, through a parallel implementation of geometry processing units. According to NVIDIA, the end result in an 8X improvement in geometry performance over the GT200. Segmenting the GPU in this way also allows for multiple levels of scalability, either at the GPC or streaming multi-processor unit level, etc.

Each GF100 GPU features 512 CUDA cores, 16 geometry units, 4 raster units, 64 texture units, 48 ROPs, and a 384-bit GDDR5 memory interface. If you're keeping count, the GT200 features 240 CUDA cores, 42 ROPs, and 60 texture units. Remember though, only 480 cores are exposed on the GeForce GTX 480, 448 on the GTX 470. The geometry and raster units, as they are implemented in GF100, are not in the GT200 GPU. The GT200 also features a wider 512-bit memory interface, but the need for such a wide interface is somewhat negated in GF100 because the GPU uses GDDR5 memory which effectively offers double the bandwidth of GDDR3, clock for clock.

If we drill down a little deeper, each SM core in each GPC is comprised of 32 CUDA cores, with 48/16KB of shared memory (3 x that of GT200), 16/48KB of L1 (there is no L1 cache on GT200), 4 texture units, and 1 PolyMorph Engine. In addition to the actual units, we should point out that improvements have also been made over the previous generation for 32-bit integer operations performance and for full IEEE-754 2008 FMA support. The increase in cache size and the addition of L1 cache were designed to keep as much data resident on the GPU as possible, without having to go off-chip to external memory.

The L1 cache is used for register spilling, stack ops, and global loads and stores, while the L2 cache is for vertex, SM, texture, and ROP data. According to NVIDIA, the GF100's cache structure offers many benefits over GT200 in gaming applications, including faster texture filtering and more efficient processing of physics and ray tracing, in addition to greater texture coverage and generally better overall compute performance.

The PolyMorph and Raster Engines in the GPU perform very different tasks, but in the end result in greater parallelism in the GPU. The PolyMorph Engines are used for world space processing, while the Raster Engines are for screen space processing. There are a total of 16 polymorph engines placed before each SM. They allow work to be distributed across the chip, but there is also intelligent logic in place designed to keep the data in order. Communications happen between the units to ensure the data arrives in DRAM in the correct order and all of the data is kept on die, thanks to the chip's cache structure. Synchronization is handled at the thread scheduling level. The four independent Raster Engines serve the geometry shaders running in each GPC and the cache architecture is used to pass data from stage to stage in the pipeline. We're also told that the GF100 offers 10x faster context switching over the GT200, which further enhances performance when compute and graphics modes are both being utilized.

Transparent
NVIDIA GF100 Features

Many of the new feature of GF100 are designed to increase geometric realism, while offering increased image quality, and of course high performance. One of the new engine features of the GF100, like other DirectX 11 class GPUs, is hardware accelerated tessellation.


Tessellation Example

The GF100 has built-in hardware support for tessellation. As we've mentioned in the past, tessellation works by taking a basic polygon mesh and recursively applying a subdivision rule to create a more complex mesh on the fly. It's best used for amplification of animation data, morph targets, or deformation models. And it gives developers the ability to provide data to the GPU at coarser resolution. This saves artists the time it would normally take to create more complex polygonal meshes and reduced the data's memory footprint. Unlike previous tessellator implementations, the one in the GF100 adheres to the DX11 spec, and will not require proprietary code.

  
Hair Demo

To show off the capabilities of GF100, NVIDIA has created a number of interesting demos. As many of you know, properly rendering and animating realistic hair is a difficult task. As such, many games slap helmets or caps on characters, if they even have hair at all. NVIDIA's Hair demo, however, combines tessellation, with geometry shading and and leverages the compute performance of the GF100 to generate flowing hair. The images were realistically lit and smoothly animated, which is a far cry from what is seen in most of today's current games.

 
Water Demo

Another demo NVIDIA created to illustrate tessellation with the GF100 is aptly dubbed the Water Demo. As you can see in the screenshots above, the water demo takes a scene with relatively basic geometry, and through increased tessellation and displacement mapping the detail in the rocks and water is dramatically increased. The demo does not use realistic fluid dynamics, but the effect was nonetheless still very good. The difference in performance between the two modes was roughly 2x--with course geometry the demo ran at about 300FPS and with high-detail it ran at about 150FPS.


  

  
New GF100 Anti-Aliasing Modes

In addition to offering much more compute performance and geometry processing than previous generations, the GF100 also features new anti-aliasing modes. The GF100 will offer higher AA performance than GT200 not only due to having more ROPs but because enhancements have been make to each ROP as well. With GF100 the data compression factor is higher in the ROPs, it can use more samples, and it offers better transparency AA quality thanks to accelerated jittered sampling.

Jittered sampling changes the sampling pattern randomly on a per-pixel basis, which help removes banding and noise, and produces an edge that is more pleasing and natural to the eye. The GF100 also offers a new 32x CSAA mode (8x + 24 color samples) in addition to support for 33 levels of alpha blended transparency. The effect of the new AA mode is much smoother edges, as seen in the screenshots above. The new AA mode also preserves more detail on textures with transparency, that are sometimes rendered incorrectly when viewed at angles, like chain-link fences or railings, for example.

Transparent
NVIDIA GF100 Features (Cont.)

To show off the increased compute performance of GF100, NVIDIA also created an interactive GPU-based ray tracing demo.

  
GF100 Ray Tracing Demo

The ray tracing demo used two identical systems, one equipped with a GF100 prototype board and the other a GeForce GTX 285. And the actual ray tracing demo used an image-based lighting paint shader, ray traced shadows, reflections and refractions running at a resolution of 2560x1600. Frame rates at that high of a resolution were quite low--less than 1 FPS in fact--but the GF100 system showed roughly 3x the performance of the GTX 285 (approximately .063 vs. .023 FPS).

  
PhysX In Dark Void

Of course, NVIDIA is also keen to demonstrate some upcoming PhysX-enabled titles. The images above are from Airtight's Dark Void. Airtight and NVIDIA jointly worked on the GPU PhysX in Dark Void to implement a Turbulence effect for the in-game jetpack and some weapon and impact effects with numerous particles.

 
NVIDIA's APEX Development Tool

Along with all of these tech demos, NVIDIA also spent some time talking to us about "The Way It's Meant To Be Played" program and some of the new tools and support being offered to developers. NVIDIA talked of their immense game testing labs which developers in the program have access to, the Technical Design Documents offered to developers, and the many SDKs NVIDIA has made available over the years. One of the newer tools being shown off is called Apex. NVIDIA calls APEX a “Scalable Dynamics Framework” that consists of authoring tools and a runtime. It acts like a plug-in for many popular tools, and while using APEX we watched as PhysX effects were literally painted onto a model. APEX was used during the development of Dark Void and the upcoming game Metro 2033.

 

 
Supersonic Sled Demo

Perhaps the most complex demo NVIDA created to showcase GF100 is the Supersonic Sled. A system equipped with three GF100 cards was used to run the demo, which exploits virtually all of the features of the GPU. The Supersonic Sled Demo uses GPU particles systems for smoke, dust, and fireballs, PhysX physical models for rigid bodies and joints, which are partially processed on the CPU, tessellation is used for the terrain, and image processing is used for the motion blur effect. NVIDIA called the demo the "kitchen sink" because physical simulation, DX11 Tessellation, environmental effects, and image processing are all employed simultaneously.

In the demo a pilot is launched down a track on a rocket-propelled sled and general mayhem ensues. Particles are strewn about and objects like a shack, bridge, and rock ledge crumble as the sled jets by. Hundreds of thousands to a million particles can be on the screen at any given time, all being managed by the GPU. The demo requires an immense amount of compute performance to run smoothly with the detail and number of particles cranked up, hence the GF100 3-way SLI configuration.


Multi-Panel Gaming Across Three Displays in 3D at CES 2010 in Las Vegas

Another new feature coming to the GeForce GTX (both GTX 400 and 200 series cards) is Surround View and 3D Surround. You'll need two cards running in an SLI configuration, but with this type of setup, three displays can be used to create a single large surface for gaming, similar to ATI's Eyefinity Technology. Surround View will exposed in an upcoming driver, but isn't available just yet.

Transparent
Fermi: Compute Capabilities

If you've followed the early announcements concerning Fermi, NVDIA's next-generation GPU architecture, you should already be aware that the new GPU core is both an evolution of the existing GT200 architecture and a significant new design in its own right.

 

The GF100 Die, A.K.A Fermi
 

While it carries many of the same features as the GT200 series, Fermi is distinctly its own animal. NVIDIA's Fermi whitepaper describes the new architecture as follows: "G80 was our initial vision of what a unified graphics and computing parallel processor should look like. GT200 extended the performance and functionality of G80. With Fermi, we have taken all we have learned from the two prior processors and all the applications that were written for them, and employed a completely new approach to design to create the world’s first computational GPU."

"Computational GPU" is short-hand for "a whole lot of number crunching". Where NVIDIA's G80 packed 128 cores and the GT200 raised the bar to 240, a full-scale Fermi implementation will pack 512 processor cores, ECC memory protection, and up to eight times the double-precision floating point throughput of its predecessor. Peak number-crunching power has increased all the way around.  Fermi can execute 64-bit FP code at 50% the speed of 32-bit FP code, as compared to 12.5 percent the speed of 32-bit FP in earlier product iterations.

 

 

Each SM (streaming multiprocessor) in Fermi (there are 16 total) has access to 64K of configurable L1 cache; the entire chip shares a 768K L2 cache. In aggregate, that's about 1.8MB of cache, significantly more than the GT200 architecture, which offered 16K of managed memory per SM.

Transparent
Fermi: Compute Capabilities (Cont.)

Other features of Fermi include support for C++ (current-generation CUDA products only support C), and, of course, the already oft-repeated fact that this core is some three billion transistors in size. NVIDIA has publicly tried to blow the importance of this off, claiming that analysts have always expressed concerns over the size of the company's chips, but there's no arguing that three billion transistors is a lot.

 

Fermi's block-level diagram. The increased amount of configurable/L1 cache per SM and the 768K of unified L2...
 Obvious improvements over GT200 but NVIDIA has made changes to boost core execution efficiency all the way around.


Dig into NVIDIA's whitepapers on Fermi, and you may end up thinking that the company designed a compute engine that happens to be capable of handling graphics rather than the other way around. Many of Fermi's changes should translate across GPU computation and gaming; there's no inherent reason why both sides can't benefit from certain improvements. Certain features, like support for 64-bit addressing, however, are rather obviously aimed at the scientific computing market rather than the needs of the game industry.

NVIDIA Nexus - 

Another one of the major projects NVIDIA has revealed is a massively parallel development environment that plugs into Microsoft's Visual Studio, dubbed Nexus. Nexus, according to NVIDIA, will allow programmers to simultaneously develop for heterogeneous computing environments. Developers will be able to use Nexus to write code intended for execution on the GPU or CPU simultaneously, and includes debugger and profiler capabilities to identify which code runs best on which execution resources.


Click Image For Larger View 

According to NVIDIA, Nexus is capable of hardware-level debugging of CUDA C, HLSL, and DirectCompute (the original G80 did not include a hardware-level debugger; this feature is only available on G84 cards and above). When profiling program execution, it's possible to view GPU and CPU events simultaneously, or drill down into a specific area. If you listen to NVIDIA, the company is quite excited about Nexus, and touts it as a major boon to developers who have long wanted such a programming interface.

"NVIDIA Nexus is going to improve programmer productivity immediately," said Tarek El Dokor at Edge 3 Technologies. "An integrated GPU and CPU development solution is something Edge 3 has needed for a long time. The fact that it’s integrated into the Visual Studio development environment drastically reduces the learning curve."

Transparent
Test System and Unigine Heaven

How We Configured Our Test Systems:  We tested the graphics cards in this article on a Gigabyte GA-EX58-UD5 motherboard powered by a Core i7 965 quad-core processor and 6GB of OCZ DDR3-1333 RAM. The first thing we did when configuring the test system was enter the system BIOS and set all values to their "optimized" or "high performance" default settings. Then we manually configured the memory timings and disabled any integrated peripherals that wouldn't be put to use. The hard drive was then formatted, and Windows 7 Ultimate x64 was installed. When the installation was complete we fully updated the OS and installed the latest hotfixes, along with the necessary drivers and applications.

HotHardware's Test Systems
Core i7 Powered

Hardware Used:
Core i7 965 (3.2GHz)

Gigabyte EX58-UD5
(X58 Express)

Radeon HD 5850
Radeon HD 5870 (2)
Radeon HD 5870 OC

Radeon HD 5970 (2)
GeForce GTX 285 (2)
GeForce GTX 295 (2)
GeForce GTX 480 (2)

6GB OCZ DDR3-1333
Western Digital Raptor 150GB
Integrated Audio
Integrated Network

Relevant Software:
Windows 7 Ultimate x64
DirectX Feb. 2010 Redist
ATI Catalyst v10.3a
NVIDIA GeForce Drivers v197.13 / 197.17

Benchmarks Used:

Unigine Heaven v2.0
3DMark Vantage v1.0.1
H.A.W.X.
FarCry 2
Crysis*
Left 4 Dead 2*
Enemy Territory: Quake Wars v1.5*

* - Custom benchmark

Unigine Heaven v2.0 Benchmark
Synthetic DirectX 11 Gaming


Unigine Heaven

The Unigine Heaven Benchmark v2.0 is built around the Unigine game engine. Unigine is a cross-platform real-time 3D engine, with support for DirectX 9, DirectX 10, DirectX 11 and OpenGL. The Heaven benchmark--when run in DX11 mode--also makes comprehensive use of tessellation technology and advanced SSAO (screen-space ambient occlusion), and it also features volumetric cumulonimbus clouds generated by a physically accurate algorithm and a dynamic sky with light scattering. Due to the fact that we tested Heaven in DX11 mode, no NVIDIA GT200 series cards are represented in the graph below.






In an effort to keep the graphs clean and easy to ready, we've separated the single-GPU and multi-GPU SLI / CrossFire results. Please take note of the separation as you flip through the next few pages.

As you can see, according to Unigine Heaven, the new GeForce GTX 480 outpaces all of the competition, even dual-GPU cards like the Radeon HD 5970. The geometry and tesselation processing capabilities of the GF100 GPU are exploited here, and as a result, it is able to pull ahead of every other card we tested.  The real question is, will tesselation engines continue to become more important in next generation game titles.  The short answer, is likely, "yes."  Though perhaps not demonstrated enough in current titles, this is a powerful new rendering technology.

Transparent
3DMark Vantage

Futuremark 3DMark Vantage
Synthetic DirectX Gaming


3DMark Vantage

The latest version of Futuremark's synthetic 3D gaming benchmark, 3DMark Vantage, is specifically bound to Windows Vista-based systems because it uses some advanced visual technologies that are only available with DirectX 10, which y isn't available on previous versions of Windows.  3DMark Vantage isn't simply a port of 3DMark06 to DirectX 10 though.  With this latest version of the benchmark, Futuremark has incorporated two new graphics tests, two new CPU tests, several new feature tests, in addition to support for the latest PC hardware.  We tested the graphics cards here with 3DMark Vantage's Extreme preset option, which uses a resolution of 1920x1200 with 4x anti-aliasing and 16x anisotropic filtering.



In our single-GPU 3DMark Vantage tests, the NVIDIA GeForce GTX 480 performs a bit better than a dual-GPU powered GeForce GTX 295 and a stock Radeon HD 5870. However, the factory overclocked Radeon HD 5870 OC put up a slightly higher score, due to the Radeon's better performance in the GPU Test 1 portion of the test.





Running the cards in a multi-GPU configuration changes the landscape quite a bit. When running in a dual-card SLI configuration, the GeForce GTX 480 scales better than the Radeon HD 5870 and hence outpaces the 5870 CrossFire configuration by a fair margin. The quad-CrossFireX Radeon HD 5970 configuration rules the roost, however.

Transparent
Enemy Territory: Quake Wars

Enemy Territory: Quake Wars
OpenGL Gaming Performance


Enemy Territory:
Quake Wars

Enemy Territory: Quake Wars is Based on a radically enhanced version of id's Doom 3 engine and viewed by many as Battlefield 2 meets the Strogg, and then some.  In fact, we'd venture to say that id took EA's team-based warfare genre up a notch or two.  ET: Quake Wars also marks the introduction of John Carmack's "Megatexture" technology that employs large environment and terrain textures that cover vast areas of maps without the need to repeat and tile many smaller textures.  The beauty of megatexture technology is that each unit only takes up a maximum of 8MB of frame buffer memory.  Add to that HDR-like bloom lighting and leading edge shadowing effects and Enemy Territory: Quake Wars looks great, plays well and works high end graphics cards vigorously.  The game was tested with all of its in-game options set to their maximum values with soft particles enabled in addition to 4x anti-aliasing and 16x anisotropic filtering.




Our custom Enemy Territory: Quake Wars benchmark produced some unexpected results. In the single-card testing, the new GeForce GTX 480 was the fastest of the singe-GPU powered cards--only the Radeon HD 5970 was faster. In the multi-card tests, however, the GeForce GTX 480 SLI configuration hovered around 137 FPS, presumably due to a CPU limitation. The Radeons showed greater scaling and didn't suffer from the same limitations, however, so we suspect there was a drive issue at play here that prevented the GTX 480s from hitting their true potential in this game.

Transparent
Crysis v1.21

Crysis v1.21
DirectX 10 Gaming Performance


Crysis

If you're at all into enthusiast computing, the highly anticipated single player, FPS smash-hit Crysis, should require no introduction. Crytek's game engine produces some stunning visuals that are easily the most impressive real-time 3D renderings we've seen on the PC to date.  The engine employs some of the latest techniques in 3D rendering like Parallax Occlusion Mapping, Subsurface Scattering, Motion Blur and Depth-of-Field effects, as well as some of the most impressive use of Shader technology we've seen yet.  In short, for those of you that want to skip the technical jib-jab, Crysis is a beast of a game.  We ran the full game patched to v1.21 with all of its visual options set to 'Very High' to put a significant load on the graphics cards being tested  A custom demo recorded on the Ice level was used throughout testing.


 

When running in a single-card configuration, the new GeForce GTX 480 absolutely smokes the older GeForce GTX 285 and performs right about on par with the Radeon HD 5870 in Crysis. Throw two cards in a machine for some SLI action, however, and the superior multi-GPU scaling of the GeForce cards allows not only the GeForce GTX 480 SLI config to outpace the Radeon HD 5870 CrossFire configuration, but the GTX 285 SLI setup does as well. Nothing comes close to touching the quad-GPU Radeon HD 5970 CrossFireX configuration though and even the quad-SLI GeForce GTX 295 setup pulled ahead of the GTX 480s at the higher resolution.

Transparent
FarCry 2

FarCry 2
DirectX Gaming Performance


FarCry 2

Like the original, FarCry 2 is one of the more visually impressive games to be released on the PC to date.  Courtesy of the Dunia game engine developed by Ubisoft, FarCry 2's game-play is enhanced by advanced environment physics, destructible terrain, high resolution textures, complex shaders, realistic dynamic lighting, and motion-captured animations.  We benchmarked the graphics cards in this article with a fully patched version of FarCry 2, using one of the built-in demo runs recorded in the "Ranch" map.  The test results shown here were run at various resolutions with 4X AA enabled.

 
 

FarCry 2 proved to be somewhat of a strong point for the GeForce GTX 480. In a single-card configuration, the GeForce GTX 480 absolutely blew past the Radeon HD 5870 and falls victim only to the dual-GPU powered Radeon HD 5970. Putting two GeForce GTX 480s together in an SLI configuration also produces excellent results, and once again the NVIDIA cards scaled better than the Radeons, but the quad-GPU Radeon HD 5970 CrossFireX configuration still put up the best numbers, albeit by a small margin.

Transparent
Left 4 Dead 2

Left 4 Dead 2
DirectX Gaming Performance


Left 4 Dead 2

Like its predecessor, Left 4 Dead 2 is a co-operative, survival horror, first-person shooter that pits four players against numerous hordes of Zombies. Like Half Life 2, the game uses the Source engine, however, the visual in L4D 2 are far superior to anything seen in the Half Life universe to date. The game has much more realistic water and lighting effects, more expansive maps with richer detail, more complex models, and the list goes on and on. We tested the game at various resolutions with 4x anti-aliasing and 16x anisotropic filtering enabled and all in game graphical options set to their maximum values.



Our single-card, Left 4 Dead 2 testing shows the NVIDIA GeForce GTX 480 with a small advantage over the Radeon HD 5870, with the Radeon HD 5970 still posting the best scores overall. Pairing cards up in a multi-GPU SLI or CrossFireX configurations, however, results in a CPU limitation and all of the cards perform at roughly the same level. Even with 4X anti-aliasing and the anisotropic filtering maxed out, L4D2 does tax today's high-end graphics cards hard enough to slow framerates considerably in a multi-CPU configuration.

Transparent
Tom Clancy's H.A.W.X.

Tom Clancy's H.A.W.X.
DirectX Gaming Performance


Tom Clancy's H.A.W.X.

Tom Clancy's H.A.W.X. is an aerial warfare video game that takes place during the time of Tom Clancy's Ghost Recon Advanced Warfighter.  Players have the opportunity to take the throttle of over 50 famous aircrafts in both solo and 4-player co-op missions, and take them over real world locations and cities in photo-realistic environments created with the best commercial satellite data provided by GeoEye.  We used the built-in performance test at two resolutions with all quality settings set to their highest values, using the DX10 code path for the GeForce GT 200 series cards, and DX10.1 path for the Radeons and GeForce GTX 480.


 

The new GeForce GTX 480 was about 11-25% faster than the Radeon HD 5870 in the H.A.W.X. benchmark, when running in a single-card configuration. And the GeForce GTX 480 showed vastly superior scaling in this game when running in SLI mode, which further enhanced its lead here. Once again, though, the dual-GPU powered Radeon HD 5970 is clearly the leader, whether running in a single-card or a dual-card / quad-GPU CrossFireX configuration.

Transparent
SiSoft SANDRA GPGPU

The SiSoftware GPGPU processing benchmark performs single- and double-precision floating point arithmetic on the GPU and the results are reported in pixels/s, i.e. how many pixels can be computed in 1 second. The benchmark will still run if support for double-precision isn't found, however, as is evident by the NVIDIA double DP scores below. Emulated results using 32-bit float are uses due to lack of native double (64-bit) floating-point support in OpenCL drivers. Regardless, the single-precision scores are still interesting...

SiSoftware GPGPU Processing
Number Crunching On The GPU

The GeForce GTX 480's single-precision floating point performance in the SANDRA GPGPU processing benchmark is significantly better than the Radeon HD 5870. Hopefully, with future drivers and / or versions of this benchmark, we can get comparable double-precision scores. Double precision floating point performance should also be a strong suite of the GTX 480.

Transparent
Total System Power Consumption

Before bringing this article to a close, we'd like to cover a few final data points--namely power consumption and noise. Throughout all of our benchmarking and testing, we monitored how much power our test system was consuming using a power meter. Our goal was to give you all an idea as to how much power each configuration used while idling and while under a heavy workload. Please keep in mind that we were testing total system power consumption at the outlet here, not just the power being drawn by the graphics cards alone.

Total System Power Consumption
Tested at the Outlet

This is a significant chart, for a variety of reasons. First and foremost, the GeForce GTX 480's power consumption while under load is extremely high for a single-GPU powered card. At 438 watts under load, the GeForce GTX 480 consumed almost 40 more watts than the dual-GPU powered Radeon HD 5970, despite offering lower performance. With regard to power efficiency, it is obvious, the GF100 GPU is significantly less efficient than the Radeon HD 5870.

With power consumption this high, it should come as no surprise that the GeForce GTX 480 also runs relatively hot, with the added side effect of a relatively loud cooler. Under load condition, we witnessed GPU temperatures in the mid to high 90'C range and even witnessed temperatures on the backside of the card hit 78'C (as tested with an infrared thermometer).

To dissipate the heat generated by the GF100, the GeForce GTX 480 is outfitted with a large cooler, with a barrel-type fan. Under idle conditions, while sitting at the desktop, we found the GTX 480's cooler to be nice and quiet. Under load, however, the fans spins up significantly and can be somewhat loud. A GeForce FX 5800 the card is not, but the GeForce GTX 480 is clearly the loudest GeForce to hit the scene in a number of years.

Transparent
Our Summary and Conclusion

Performance Summary: NVIDIA has created a powerful GPU in the GF100, as our performance data of the new GeForce GTX 480 has shown. Generally speaking, versus the single-GPU powered Radeon HD 5870, the GeForce GTX 480 is on average roughly 5% - 10% faster, give or take a few percentage points depending on the test, which technically makes it the fastest single-GPU on the market (almost). The GeForce GTX 480 held had the largest lead in the DX11-based Unigine Heaven benchmark and in Tom Clany's H.A.W.X.  Unfortunately for NVIDIA, however, the Radeon HD 5870 is cheaper to produce, consumes less power, is quieter, and it costs about 25% less ($499 vs $399). And AMD also has the dual-GPU powered Radeon HD 5970 in its arsenal, which remains the fastest single-graphics card available for most current game titles.

The GeForce GTX 480's performance lead over the Radeon HD 5870 increase when paired up in a dual-GPU SLI configuration. With their current drivers, NVIDIA-power cards offered better performance scaling in multi-GPU configurations, which resulted in larger performance increases for the GeForce GTX 480.  With that in mind however, a dual-card Radeon HD 5970 quad-CrossFireX configuration was still fastest overall.

Depending on your perspective, today will either be considered a great victory or perhaps a crushing defeat for NVIDIA. On one hand, the company has produced was is undoubtedly the most powerful and complex graphics processor in the world. The 3-Billion transistor GF100 is a very capable chip, both in terms of gaming and in terms of compute performance and NVIDIA owns the single-GPU performance crown again. The GeForce GTX 480 is faster than the Radeon HD 5870 overall and its forward thinking design lays the foundation for future generations of NVIDIA processors moving forward, as the G80 did for much of the previous generation. On the other hand, the GeForce GTX 480 is late to market, the GPU consumes a lot of power and hence generates a lot of heat, even with "only" 480 of its 512 shader cores exposed, and its performance lead doesn't exactly jibe with its projected 25% price premium.


The NVIDIA GeForce GTX 480 Reference Card

Although the company is announcing the cards tonight at the PAX event taking place in Boston, MA, widespread e-tail availability of both GeForce GTX 480 and GTX 470 cards, at prices of $499 and $349 respectively, won't happen until the week of April 12, 2010. Questions linger as to how many GF100-based graphics cards will ultimately hit store shelves, but NVIDIA tells us plenty are on the way. NVIDIA claims, "We are building 10s of thousands of units for initial availability, and this will ensure our partners have ample volume for what is the most anticipated GPU launch ever." If you're an NVIDIA fan and have been waiting for their next-gen GPU, your wait is almost over.

Having spent some quality time with the GeForce GTX 480, we can't help but expect the card, as we have shown it to you here today, will not be NVIDIA flagship for an extended period of time. The true potential of the Fermi architecture hasn't been fully realized just yet. We suspect a re-worked GF100 is on tap that will have all of its 512 cores available and hopefully hit higher clocks, with lower power consumption. We are only speculating at this point, of course, but we can't help but feel the GeForce GTX 480 isn't the card NVIDIA really wanted to launch to take on AMD's finest, and that its successor is priority #1 within the company. The GeForce GTX 480 is an extremely potent product, it's just not the game changer some may have expected.

  • Relatively Fast
  • DirectX 11 Support
  • PhysX + CUDA Support
  • Great SLI Scaling

 

  • High Power Consumption
  • Hot and Can Be Loud
  • Late To Market
  • Only Slightly Faster Than 5870, For Much More Money

 



Content Property of HotHardware.com