NVIDIA Titan V GPUs Reportedly Flunking Math In Certain Scientific Simulations
Three and a half months ago, NVIDIA launched its first consumer graphics card with a Volta GPU inside, the mighty Titan V. While it's a capable gaming card, that is not what it was designed for. The Titan V was built for professional and academic deep learning applications, and as we saw first-hand, it is an absolute beast when it comes to scientific computing calculations. Unfortunately, it may not always get the math right, according to what The Register has been hearing from computer scientists.
One of the site's sources, an unnamed engineer, said the Titan V is prone to coming up with different results to repeated complex calculations under certain conditions. One of those situations involves running identical simulations of an interaction between a protein and an enzyme. The tests should produce the exact same result each time, but on tests of four different Titan V cards, the engineer found that two of the cards would spit out numerical errors around 10 percent of the time. This is not something the engineer has ever observed on previous generation graphics cards.
So what's the deal? NVIDIA has not commented on the matter, but a supposed "industry veteran" (also unnamed) told the site it might be due to a flaw in memory. The thinking behind that hypothesis is that companies like NVIDIA push their higher end hardware to the limit to squeeze the most amount of performance possible out of their gear. In this case, the industry vet thinks NVIDIA might be pushing the Titan V too hard, and in a way that is causing memory to cough up hairball in certain situations.
These types of errors should never happen when dealing with professional tasks. Scientific software models need precise calculations to be fully effective, and cards aimed at professionals (along with accompanying drivers) typically prioritize accuracy over gaming performance. While the Titan V is a consumer part, it's a bad look for NVIDIA if the card gets certain calculations wrong one out of 10 times.
According to The Register, this is not something that is traceable to a bad batch of cards or some random defect in the chipset, but is an issue that has affected NVIDIA in the past as well. In prior cases, it's said NVIDIA released patches to address the issue. Perhaps the same will be done for the Titan V. For now, however, scientists would be wise to double and triple check their results.