In the past few years there have been many studies claiming GPUs deliver substantial speedups ...over multi-core CPUs...e perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor narrows to only 2.5x on average.
The previously reported LBM number on GPUs claims 114X speedup over CPUs. However, we found that with careful multithreading, reorganization of memory access patterns, and SIMD optimizations, the performance on both CPUs and GPUs is limited by memory bandwidth and the gap is reduced to only 5X.
I just want a computer that works well. I want one that plays the games that I like and can afford as smoothly as possible. I have a NVIDIA Card solution and also an ATI card solution. Coupled with the i7 and the i5 CPU's I'm using, I get just that, smooth power without any hickups.
I built an AMD Phenom-II X3-720 based system with a Radeon 5670 card in it for not much cash at all and it's a solid performer as well. If it had a high powered Video card in it like the other two systems, it would be on a par with both of the Intel boxes. AMD solutions are nothing to sneeze at and they're very affordable too.
These two companies can see who's stream is golden and whether or not it's farther than the others all day long. I could care less.
Everything that's happened in the past few years has translated into some very nice computers being available to us at decent prices. And I'm glad of that.
Don't part with your illusions. When they are gone you may still exist, but you have ceased to live.
Real, that's not very forward looking of you! Bad consumer! Bad consumer! *spankspank*
Give it up for the Scarecrow though here, come on! :)
Editor In Chiefhttp://hothardware.com
I like the flower in the first quote. When you view the article from the main page it does not show up. But in the forum it does...lol ..
"Never trust a computer you can't throw out a window."
Z77 GIGABYTE G1.SNIPER
G.Skill Ripjaws X 16gb PC2133
Asus Blu-ray burner
Seasonic X650 PSU
Patriot Pyro 128gb SSD
Intel may offer parallel performance .... but at what price? That's the only real consideration.
I'm not sure what you mean, especially if we view the situation in historical context. Six years ago, a dual-socket motherboard was easily $450-$500; the Tyan S2895 workstation board I used for several years was, IIRC, a $1200 board. Quad-CPU motherboards were even more expensive--think $2000-$4000. The CPUs that ran in these boards also commanded massive premiums; even after AMD entered the market. A dual AMD Opteron board+ 2 CPUs might be $2000-$3500; a Xeon MP configuration might run upwards of $5K.
Some of these initial figures may be a bit off, but consider them in context with modern prices. A *nice* 890GX board from Asus is $140. A quad-core AMD Phenom 955 is $159; a six-core 3.2GHz 1090T is $295. Intel's prices aren't quite so nice, but the quad-core i7 is $279 while a solid i7 motherboard is ~$200.
AMD's ratios are better, but even Intel's prices are less than 10% of what they were five years ago at the quad-core level. Cores have become insanely cheap as the amount of supporting circuitry and hardware needed to use them has shrunk.
Yeah Joel, what I said.
Good shtuff, less money.
wait long time to upchuck more cash.
There is a 120x speed up from a [single core of a] core 2 duo to a 500Mhz ATI 5750 (400 stream processors)
That result can be replicated with the BOINC project "Collatz Conjecture" which is already optimized for CPU's and GPU's proprietary API's.
Try porting something serially constrained like zlib to a GPU, you'll quickly find that you you'll have to break backwards compatibility or sacrifice some of the compression gains in order to use it.
If Intel want's developers to use their CPU's in an optimized manner to compete with GPU's they should provide source-code (C/ASM) for their x86/x86-64 versions for such things the CPU excels at. Otherwise they have no reason to complain that developers don't know how to program their processors efficiently.
Divide and conquer is the method, just like in Iraq and Afpak.
zlib IS serially constrained but not severely. Pathologically compressible patterns DO suffer from being split up into smaller chunks before being compressed (on account of redundant dictionaries), but ordinary data forego only a few percents of theoretical size reduction of a given algorithm. However, the space is not wasted - slicing into conveniently sized chunks simplifies recovery from data corruption while at the same time trivializing navigation in compressed material, obviously something to recommend it for even if your actual constraint is hardware, e.g. a lonesome single-core. More can be gained by using tighter deflators, which may be prohibitively CPU-intensive for a Wintel, than can be lost by slicing and splicing across massively parallel shaders of a humble GPU.
As an admin I'm a big fan of compressions. If an AMD HD1150 or an nVidia GTX8200WTF could give me "only" 2.5 times more compression per second than the 24 cores of an i9 Intel HPC I'd choose half a dozen GPUs and an AM3 Thuban without blinking. Given that Intel compared apples and oranges - its latest in the EXTREME series with a rather modest model of the competition - I'd probably end up with much more speed gain anyways. Even with 500 times more or more the compression is far from being memory bandwidth-constrained.
Just how fast is the C2D? I think it's important to note what the exact clockspeeds are when making these sorts of comparisons so as to not litter the floor with a bunch of data that only applies in certain cases.
Even if we assume perfectly linear scaling by core and by clockspeed, obviously a C2D won't catch a 5750--but it's possible that the normalized comparison between the two is significantly lower than 120x once these factors are adjusted for.
...registered just to comment.
As a grad student with a background in rendering (offline and real-time) and mathematics, anytime I sit down to write computationally intensive code I do so using DirectX knowing that my GPU will kick my CPU to the curb every time. This was an annoying chore with DirectX 8 and 9 class hardware but with DirectX 10 and especially 11 it's a real joy. Finite element methods, fluid dynamics, signal processing, anything that MathCAD or Maple does and so on are all things that GPUs excell at.
Speed improvements using CUDA are not as dramatic as simply using DirectX or OpenGL directly and that's probably part of the problem. Getting a university's resident math and physics geeks to stop using Fortran and start using C/C++ and CUDA is hard enough, getting them to use an alien API built specifically for graphics in order to see real improvements is basically a religious debate (ie not possible). I suppose that's understandable; people want to spend time doing research / solving problems not learning some obscure and endlessly changing API.
I can't really speak for 'real' scientific applications designed to run on massive mainframes but I'm inclined to believe that GPUs will have something to offer in the near future. It's a bummer that the best way to leverage a GPU is through DirectX and therefore not applicable to industrial strength computing environments. There's also the issue of the IEEE floating point standard. When it comes to IEEE compliance my GeForce 8800 loves to make unpredictable and seemingly random deviations; run the same code on a different card and you'll get a different set of anomalies. If you need double precision floats then you have to bypass the GPU altogether. If ATI and Nivida can address these problems and a lot more people put the time in to understand how your average GPU works the price vs performance ratio of scientific computing environments stand to take a huge drop and that's always a good thing.
Thanks for registering and dropping in, we always like to hear from folks actually doing this kind of work. I've got a question for you regarding DP FPU calculations--I've always heard that the G80/G92/GT200 cards took a heavy hit when doing DP as opposed to SP FPU work, but this is the first I've heard that the cards themselves turn out results that are incorrect. Have you ever had the opportunity to test this in cards past the 8800?
Unfortunately no, school is expensive and so are video cards. I’ve played with DirectX 11 through software emulation which obviously gives you IEEE compliant behavior but that’s no guarantee that DirectX 11 hardware will do the same… though I suspect it will. I’ve never heard ATI or Nvidia explicitly state their hardware is IEEE compliant but the DirectX documentation makes it sound like a card must be in order to claim proper DirectX 11 compatible status.
I wouldn’t really say DirectX 9 and 10 hardware occasionally produce incorrect results it’s just that in some cases the rounding behavior and the handling of floating point specials doesn’t follow IEEE rules. That’s a big deal for an engineer who is accustom to writing code that deals with overflow events, divisions by zero and what have you by depending and very specific standardized behavior.
Support for 64bit floats in DirectX 10 is non-existent (as far as I know anyway) and while it does exist in DirectX 11 it comes with some heavy limitations. You can run shader programs with DP but you can’t store the output without first converting back to SP. This is extremely limiting as most applications use multiple shader programs to successively iterate over the same piece of data (numerical integration with intermediate results stored as texture data). There might be some card out there that supports double precision texture data via some extension but it’s not in any of the DirectX 11 documentation I’ve seen.
Most of the performance hit from DP is due to running out of temporary registers because you need twice as many to do the same amount of work. While a card might have 1024 stream processors if your shader program uses a large number of temporary registers then you are going to get a lot of idle streams sitting around waiting for registers to become available. Converting to single precision (and if you can get away with it half precision) is in some cases faster not because the actual fp ops are quicker but because you have more registers to work with and that equates to more active streams.
Have you looked at any of the information on Fermi? I know one of its core features--or at least, one of its core features on the workstation "Tesla" cards--will be DP performance that's far higher than what any card before it has managed.
Realneil: Ignorance is bliss. But you have to remember, the ignorant don't live very long :P
That's more like Price vs Performance. AMD/ATI vs Intel/Nvidia. Competition is what's killing the marketing system. E.G. GTX 480 (Power-Hungry) vs HD 5870 (Quiter, Cooler, Faster)
I'll give you quieter and cooler, but it's my understanding that the HD 5870 and GTX 480 are pretty well matched as far as performance is concerned.
Yeah, by a small margin...
NEWS TIPS |
This site is intended for informational and entertainment purposes only. The contents are the views and opinion of the author and/or hisassociates. All products and trademarks are the property of their respective owners. All content and graphical elements areCopyright © 1999 - 2013 David Altavilla and HotHardware.com, LLC. All rights reserved. Privacy and Terms