Programmer/activist Aaron Swartz has been arrested for data theft in connection with an incident that occurred at MIT in late 2010. Swartz is accused of downloading nearly five million documents from JSTOR, an online, non-profit academic journal archive system. The particulars are as follows:
Swartz, who has a history as a political advocate and founded the group Demand Progress, was granted access to JSTOR as part of a fellowship at Harvard University's Center for Ethics. He therefore had the undisputed right to access JSTOR content--though not, as the filing notes, the authority to download the database using automated tools, reproduce such articles, or download the entire contents of any journal issue.
Aaron Swartz (photo by Jacob Applebaum)
Swartz, however, didn't simply download JSTOR's content--he used MIT's equipment to do it. The indictment states
that the 24 year-old programmer broke into a network closet at MIT, used the equipment in that closet to access MIT's network, used that equipment to download JSTOR's database, and took steps such as covering his face with a bike helmet to elude detection. Swartz's use of MIT's network was part of a general attempt to evade the joint efforts of both the university and JSTOR to lock him out.
According to court documents, Swartz's data harvester initially frustrated JSTOR's attempts to ferret out the perpetrator. The download demands were so high, they collectively overwhelmed the archive organization's ability to handle normal network traffic. After blocking Swartz's IP address on the 25th and his new address on the 26th, JSTOR was frustrated enough that it resorted to completely blocking MIT; service was not restored until September 29.
MIT then blocked Swartz's MAC address. He responded with spoofed MAC addresses and began using two notebooks to download data instead of one. This sort of behavior went on for months, with Swartz moving from building to building, eventually avoiding the guest registration process altogether, and hiding his equipment in closets during the day. In the end, Swartz allegedly downloaded 4.8 million documents, including 1.7 million made available for purchase by independent publishers.
JSTOR has released its own statement
. It reads, in part
Last fall and winter, JSTOR experienced a significant misuse of our database. A substantial portion of our publisher partners’ content was downloaded in an unauthorized fashion using the network at the Massachusetts Institute of Technology, one of our participating institutions. The content taken was systematically downloaded using an approach designed to avoid detection by our monitoring systems.
The downloaded content included more than 4 million articles, book reviews, and other content from our publisher partners' academic journals and other publications; it did not include any personally identifying information about JSTOR users. We stopped this downloading activity, and the individual responsible, Mr. Swartz, was identified. We secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed. The criminal investigation and today’s indictment of Mr. Swartz has been directed by the United States Attorney’s Office.
The fact that Swartz seized documents intended for resale is potentially problematic for his case. JSTOR's statement implies the organization might have been willing to let the situation go, but Swartz's behavior--and the tremendous headaches he caused both MIT and JSTOR itself--were apparently severe enough to raise the hackles of the US Attorney's Office.
"Stealing is stealing whether you use a computer command or a crowbar, and whether you take documents, data or dollars,” said US District Attorney for MA, Carmen Mortiz. In addition, MIT may have its own bone to pick with the young man--his unauthorized use of the school's network led to it losing access (albeit temporarily) to an important resource.
Swartz, meanwhile, scarcely comes off as a sympathetic figure. In an interview four years ago, he claimed
to be one of three Reddit founders (something Reddit strenuously denies), wrote of how he hated working at an office, and noted that he didn't think his boss was happy with him disappearing for so long while on vacation, stating: “I bet the first time my boss finds out where I am is when he sees my photo on the front page of his own website.” He was, at the time, mystified as to why he got fired.