Google Drive Cloud Storage Service Employs Hash Matching To Detect Pirated Content
What is a hash matching? Hashing transforms a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is also used in many encryption algorithms. A hash match is when a SQL Server hashes the columns involved in an aggregation or a join in order to match the columns either to each other or other tables.
Google’s legal director for copyright Fred Von Lohman noted at the 2016 Copyright Office Roundtable that “Google Drive does hash matching.” It appears then that Google Drive assigns unique hashes to each file and records the hashes of content when a takedown notice is received. Users are then blocked if they attempt to share the flagged hashes.
It is important to note that at this moment in time, there appears to be no consequences for storing pirated materials. It is only when a Google Drive user attempts to share their pirated file that they are prevented from doing so. Google Drive’s terms of service specifically state, “Do not share copyrighted content without authorization or provide links to sites where your readers can obtain unauthorized downloads of copyrighted content...Repeated infringement of intellectual property rights, including copyright, will result in account termination.”
Google Drive is not the only service to utilize hash matching. Both Dropbox and YouTube’s Content-ID system both rely on hash matching to prevent piracy and other cases of copyright infringement. This goes without saying, but please do not use, store, or share pirated material online.