Big Data Researchers Post Personal Info Of 70,000 OkCupid Profiles And It’s Not OK
Lead author Emil Kirkegaard and his research partner Julius Daugbjerg Bjerrekaer zeroed in on OkCupid "because users often answer hundreds of not thousands of questions," making it a rich and readily available source of survey data. The information obtained is also public, though the authors never sought consent to gather, organize, and post the data in such a manner.
The research team used software to "scrape" the data they needed. It was then posted to the Open Science Framework, a free cloud service for storing, sharing, and collaborating on research projects. Attempts to access the data now return a message indicating it's "unavailable for legal reasons," though the fact it was posted in the first place is disturbing. Kirkegaard defended his team's project on Twitter on the basis that the data is already public.
That's true, but experts counter that mining personal information from 70,000 users and then posting the data set on the web without consent is, minimally, a violation of social science ethics. No real names were posted, but it did contain answers to private topics, such as sexual turn-ons and orientation. These answers were attached to usernames, location, age, and other clues that would make it easy to figure out someone's identity. And in a separate paper related to the study, Kirkegaard and Bjerrekaer note the only reason they didn't also collect and post profile pictures is because it would have taken up too much disk space.
The researchers should have predicted the controversy that would follow, and they sort of did. At the very least, they were aware that others would question the ethics of what they were doing.
"Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it [in] a more useful form," the researchers wrote.
OkCupid doesn't agree. The site views what the researchers did as a "clear violation" of its terms of service and is "exploring legal options."