GeForce RTX 2080 Ti Deep Learning Benchmarks Show Big Gains Over GTX 1080 Ti
The initial focus on NVIDIA's recently launched GeForce RTX 2080 Ti and GeForce RTX 2080 graphics cards has been on how well they perform in games, especially when cranking up the resolution to 4K (3840x2160). That will continue to be a point of interest, though it's not the only one. A fresh set of benchmarks making the rounds highlight how the new cards perform in deep learning workloads.
Before we get to the numbers, let's talk about why this matters. As you might already know, the GeForce RTX series pushes consumer graphics cards into new territory. Typically with each new generation of graphics cards, consumers benefit from faster rasterization rendering and sometimes better power efficiency. The GeForce RTX 2080 Ti and GeForce RTX 2080 are indeed faster than the cards they replace, but they also introduce dedicated ray tracing (RT) cores for more realistic lighting, and Tensor cores to accelerate large matrix operations, as part of the underlying Turing architecture.
We've already done deep dives into Turing and the first batch of GeForce RTX cards specifically. To briefly recap the specs, however, the GeForce RTX 2080 Ti boasts 4,352 CUDA cores, 552 Tensor cores, and 68 RT cores, along with 34 texture processing clusters (TPCs), 68 streaming multiprocessors (SMs), 88 render output units (ROPs), and 272 texture units (TUs), The GeForce RTX 2080 (non-TI) has 2,944 CUDA cores, 376 Tensor cores, 46 RT cores, 23 TPCs, 46 SMs, 64 ROPs, and 184 TUs.
Considering these cards' advanced specs and capabilities, it's reasonable to think they will have appeal to a wide audience, including gamers, developers, and even researchers who don't want to shell out for a Quadro card. Lambda, an AI infrastructure company that sells workstations, servers, and cloud services, ran some deep learning training task benchmarks on the GeForce RTX 2080 Ti using TensorFlow, an open source machine learning framework.
Lambda's testing showed the GeForce RTX 2080 Ti's single-precision (FP32) training of CNNs with TensorFlow to be between 27 percent and 45 percent faster than the previous generation GeForce GTX 1080 Ti (Pascal) for measured networks, and its half-precision (FP16) to be between 60 percent and 65 percent faster.
Here's a look at some of the raw numbers obtained:
Just as with evaluating gaming performance, cost factors into the equation. The GeForce RTX series carries a price premium over Pascal, where the GeForce RTX 2080 is priced in line with the GeForce GTX 1080 Ti.
"If you do FP16 training, the RTX 2080 Ti is probably worth the extra money. If you don't, then you'll need to consider whether a 71 percent increase in cost is worth an average of 36 percent increase in performance," Lambda notes.
For anyone who wants to check their own numbers, Lambda uploaded its customer benchmark code to GitHub.