Google Cloud Machine Learning To Make Real-Time NCAA Final Four Predictions
The Google Cloud employees working on this project (affectionately known as the “Wolfpack”) started this past December by sending “play-by-play” files from the NCAA and Sportradar to Cloud Storage. This data was a mixture of “JSON, XML and CSV files ingested via RESTful services and/or pulled from FTP”. They then used Cloud Dataflow to extract the files from Cloud Storage, convert them into “actionable structures”, and load them into “BigQuery”, a RESTful web service. BigQuery was able to provide a public data-set that the team could then analyze. In total there were roughly 1.6 million discrete files.
The Wolfpack’s first analyses were initially smaller in scope. For example, they discovered that juniors blocked more shots per minute and that teams with feline mascots have caused more upsets. They even figured out that 1991 was the last Final Four upset during a full moon. The Wolfpack, however, wanted take their analyses to the next level. They decided that the Final Four in San Antonio, Texas would be the perfect opportunity to test whether they could use their data to make real-time predictions.
During the game, the Wolfpack will update the “Cloud Spanner”, Google’s NewSQL database, every two seconds. They will make predictions by taking the latest information from BigQuery, Cloud Spanner, and their other trained models and placing that data it into a real-time rendering system built by Cloneless and Eleven, Inc. This system will then produce real-time prediction videos that will play during halftime.
Google’s main purpose for this project was to “build an entire architecture in a server-less cloud-based environment.” Google has been actively working on expanding their cloud space to compete with Amazon and Microsoft. This past January they announced that they would be building three undersea fiber-optic cables to speed up data transfers.
The 2018 Final Four will begin Saturday, March 31st at 6:09pm EST.