CUDA and the Future
Enabled By CUDA
Utilizing Tesla’s processing power isn’t as simple as adding a discrete card to your favorite workstation. Rather, the application you’re looking to accelerate must be written specifically to take advantage of the hardware, which, in the case of existing software, means an effort must be made to re-code in NVIDIA’s CUDA development environment. Yet, with 70 million CUDA-capable GPUs in the wild and a second-generation recently launched, developers are starting to see a much larger potential audience for CUDA-enabled apps.
NVIDIA recently posted the beta of CUDA 2.0, which adds support for the 32- and 64-bit versions of Windows Vista and the Tesla T10’s double-precision capability. The package includes a CUDA toolkit and the CUDA SDK—CUDA-compatible drivers are already included in every display driver download. Note that CUDA is free and doesn't require registration to download. If you're into programming and want to give NVIDIA's latest a look, the company wants you to dive in.
CUDA support, from integrated core logic to add-in boards to Tesla servers
The Tesla C1060 runs at 1.33 GHz and includes up to 4GB of GDDR3 memory
Five organizations with experience in CUDA were represented at the NVIDIA event, each with a slightly different spin on the technology in action. For instance, a company called TechniScan Medical Systems is using CUDA-enabled software for a very practical purpose. Its Whole Breast Ultrasound system creates three-dimensional images, which are then used to help diagnose abnormalities. Powered by a Pentium M cluster, the scanner takes nearly five hours to create its image—far too long for a doctor to take the scan and go over it with her patient in the same visit. A 16-core Core 2-based cluster cuts the time down to 45 minutes. A quartet of Tesla GPUs gets that number down to 16 minutes; much more acceptable for same-visit results.
We also saw presentations demonstrating the benefits of GPU computing through the eyes of astronomers, the finance industry, cancer research, and academia, where HPC, scientific computing, visualization, virtual 3D audio, and computer vision all come together.
More relevant on the desktop, a company called Elemental Technologies is developing a non-linear editor for Adobe’s Premiere Pro that will let you render Blu-ray-quality H.264 video in real-time. So long as you have a Quadro card based on at least the G80 GPU, the CUDA-enabled add-on will yield performance gains.
The Future of HPC?
Let’s say you work for an enterprise and you’ve been tasked with designing a datacenter capable of 100 teraflops. According to NVIDIA, tackling the job with quad-core CPUs in 1U boxes, you’d need 1,429 servers at $4,000 apiece. Sipping 400W each, that cluster would consume 571KW of power and cost nearly $6 million dollars to procure.
You’d purportedly get the same horsepower from 25 servers and as many 1U Tesla S1070 systems running in a heterogeneous computing environment. With each of the Tesla boxes priced around $8,000, added to the $4,000 servers, you’d be looking at a scant $300,000 total. And even though the S1070s use as much as 700W each, the total package would still only need 27KW.
Top-down on the Tesla S1070, comprised of four T10-based cards and a power supply
The add-in Tesla C1060 doesn't include display outputs--it's all about accelerating HPC applications.
Of course, NVIDIA’s scenario leaves out one important data point: the cost of re-coding an application to run in the CUDA environment. No doubt that’ll be substantial for many considering a transition to GPU-based computing. We’re only 12 months into the movement though, and already NVIDIA says there are a few dozen commercial CUDA applications.
No doubt the prospects for businesses and academic institutions are great. But the most interesting thing for desktop enthusiasts is that any 8-series GeForce card and higher supports CUDA. So, as optimized software continues emerging, the hardware infrastructure is in place—even if you won’t get the massive performance of a Tesla or Quadro card loaded down with 4GB of memory.