But we’re getting ahead of ourselves. Before NVIDIA rolled out the red carpet for its academic and commercial adorers, it introduced a new development in its HPC product lineup promising to more than double the previous generation’s performance numbers.
The Belgian researchers who built FASTRA used a quartet of desktop GeForce 9800 GX2 cards in their supercomputing experiment to get the speed they were looking for at a reasonable price. A business relying on continuous uptime, impeccable accuracy, and drivers optimized for professional software probably wouldn’t follow suit. NVIDIA cautioned that while GeForce works great as a development platform, it’s probably not the best choice for production work.
Next question: so what would someone in a production environment use? Well, there’s Quadro—a name most workstation users likely recognize. Those cards center on the same underlying architecture as the GeForce boards with a handful of hardware enhancements and completely retooled drivers. But you’re still looking at (and paying for) a graphics product. Enter NVIDIA’s Tesla computing processor.
The new Tesla T10 and its 240 cores
A Threading Processor Cluster, in depth
Tesla is already for sale. You can hop onto NVIDIA’s online store and pick up a Tesla C870 for $1,299. The card sports specifications that look a lot like the Quadro FX 5600 (priced at $2,999 on the same site), including 1.5GB of GDDR3 memory, 128 onboard streaming processor cores and a PCI Express x16 interface. There’s no display output, through. The Tesla is exclusive to the HPC market.
For clarity’s sake, NVIDIA classifies Quadro as a superset of Tesla, armed with all of the same features, plus graphics, which is why you pay an extra $1,700 for a Quadro FX 5600 card. All three product lines—GeForce, Tesla, and Quadro—are armed with similar silicon and thus equipped to power through applications enabled by CUDA.
Tesla 10-Series, Uncovered
The big news at NVIDIA is a second generation of Tesla products based on the 10-series GPU that you’ll also see driving fresh desktop and workstation cards. The 10-series chip is massive, boasting 240 processing cores (nearly 2x its predecessor), 1.4 billion transistors (again, close to double), and close to 1 teraflop of peak single-precision processing power (you guessed it—twice that of the C870 board).
If you’ve already read reviews of the new GeForce boards, then you already have the scoop on Tesla’s T10 processor. The chip is a SIMT (single instruction, multiple thread) architecture, which allows software developers to think about their functions and threads rather than vectors. It, as mentioned, wields 240 thread processors, broken up into 30 TPAs (Threading Processor Arrays) each with eight TPs.
And whereas the previous generation topped out at 1.5 GB of memory, the 10-series supports up to 4GB. If that sounds excessive, consider that HPC datasets often include terabytes worth of information. And we talked to a couple of different developers at the NVIDIA event who said they were holding out for a 4GB card before diving into this latest generation, specifically because it’d give them the most palpable gains. Memory bandwidth also gets a boost thanks to a 512-bit bus that moves up to 102 GBps, up from the 8-series’ 77 GBps peak. Of course, the new Tesla products support PCI Express 2.0, yielding gains in systems with multiple cards contending for bandwidth.
Projected performance numbers, collected by NVIDIA
Many-core versus multi-core scaling, per NVIDIA
Like the new GeForce cards, the Tesla T10 supports IEEE 754 double precision floating-point encoding—much more significant to the HPC community than the desktop. In fact, NVIDIA is confident that the addition of double-precision will open Tesla up to entire market of applications it couldn’t touch before. With that said, a couple technology partners made it a point to observe that intelligent implementation of single-precision is often times as effective as and faster than double. So, it remains to be seen how double-precision support positively affects Tesla’s adoption.