Stanford And Cornell Researchers Develop Internet Troll Identification Algorithm To Auto-Clean The Riff Raff

It's said that a single bad apple can ruin the whole bunch, but Internet trolls are even worse. They not only ruin what could have been an engaging discussion on an interesting topic with intelligent individuals, but they often manage to go undetected until that moment when all hell breaks loose. And then the ban hammer drops, sometimes taking out good members who got caught up in a flame war.

Turns out this doesn't have to be the case. Researchers from Stanford University and Cornell University studied online discussions from news site CNN, political site Breitbart, and gaming site IGN spanning 18 months, 1.7 million users, and nearly 40 million posts with more than 100 million votes. Specifically, the researchers were interested in looking at members who were eventually banned and comparing their activity with that of members who were never banned.

"By analyzing the language of their posts, we find significant differences between these two groups," the researchers note. "For example, FBUs (Future Banned Users) tend to write less similarly to other users, and their posts are harder to understand according to standard readability metrics. They are more likely to use language that may stir further conflict (e.g., they use less positive words and more profanity)."

Troll Ave

The researchers also found that banned users have a history of concentrating their efforts in individual threads rather than spread out across several topics in a forum. And since they receive more replies than average users, it's reasonable to assume that the "they might be more successful in luring others into fruitless, time-consuming discussion." Certainly we've all been there at one point, right?

After collecting data and analyzing the above and other trends, the team came up with an algorithm that they claim can predict with 80 percent accuracy whether a troll will be hit with the ban hammer, and it only takes the first five posts of an individual to figure it out. How so?

Some of the things the algorithm takes into account are the number of down votes, reported posts, and most telling of all, their reactions to having posts deleted by moderators. However, even when you take moderators out of the equation, the algorithm still supposedly works with a 79 percent accuracy rate.

It's unlikely that sites will adopt a formula for the purposes of pre-banning trolls before they cross a line, but something this could be used to identify potential trouble makers early on so that moderators can keep a closer eye on them.

You can read the full study here (PDF).
Tags:  Study, trolls, disqus