Netflix To Deploy GPU-Powered Neural Networks For Deep Learning In Movie Recommendations

Netflix has long chased after methods of improving its movie recommendation algorithms, once even awarding a $1M prize to any team of people who could substantially improve on the then-current design. As part of that process, the company has been researching neural networks. Conventional neural networks are created with vast nodes of CPU clusters, often with several thousand cores in total. Netflix decided to go with something different, and built a neural network based on GPU cores.

In theory, GPU cores could be ideal for building neural nets -- they offer huge numbers of cores already linked by fast internal interconnects and with relatively large pools of onboard memory. Whether or not the approach could be adapted to work with Amazon's own cloud hosting and on shipping graphics hardware, however, was an entirely different question. What Netflix found in its research holds promise for such implementations in the future, provided that certain problems with underlying libraries are ironed out by Nvidia. It initially took the company's engineers more than 20 hours to "train" their neural network model; hand-optimizing the CUDA kernel eventually reduced this time to just 47 minutes.



With the algorithm tested, Netflix then moved it to a fully production server with a GeForce Grid K520 GPU and 1536 Kepler cores as opposed to the 448 Fermi-generation cores baked into the first system. The Netflix team deployed multiple sophisticated software packages to build an integrated cluster that was capable of sharing the workload from its neural network across the GPU and CPU alike on multiple AWS server instances -- all running in the cloud.

The Cloud Grows Up

The significance of this is twofold: First, it indicates that heterogeneous computing solutions like graphics cards are increasingly able to model activities that have been the domain of extremely sophisticated servers and multi-cluster CPU configurations. The difference between running a neural network on a series of GPUs and running an HPC application written in CUDA or OpenCL is significant. Neural network and human brain simulations are a major part of the reason why organizations like DARPA are pushing hard to deploy exascale computing by 2020; being able to run such simulations on the GPU could allow for much more sophisticated, faster models.



Second, and equally important, advances like this illustrate just how fast cloud servers are maturing. A few years ago, the idea of running a neural network simulation out of a commodity cloud provider would've been ludicrous. Today, it's something companies are willing to target -- and seeing good results. Granted, Netflix's movie prediction algorithms are a far cry from simulating the human brain, but the fact that the problem can be shifted into the datacenter at all is an impressive sign of progress.