One of the most fascinating stories of the 2016 U.S. Presidential election was the story of how a well-planned social media campaign based in Russia may (or may not?) have influenced the result.

There is now no doubt that this campaign existed, according to multiple reliable sources. And the fact that we had no idea at the time should make us very, very worried.

If we didn’t know it at the time, can we at least look back with hindsight understand how it happened? That’s the idea behind a new analysis, published yesterday on my favorite source for news and analysis,

The article describes the research of two professors at Clemson University, Darren Linvill and Patrick Warren. They used Clemson’s Social Media Listening Center to recover tweets from 3,841 Twitter handles associated with the Internet Research Agency, the most prominent of the Russia-based organizations accused of creating fake accounts to influence the election. Their dataset covers the period from June 2015 to December 2017, and includes nearly three million tweets.

The result of the two researchers’ work is a preprint called “Troll Factories: The Internet Research Agency and State-Sponsored Agenda Building,” currently undergoing peer review (PDF available on Warren’s website).

The image below, from the FiveThirtyEight article, shows how the number of tweets from these accounts varies with time.


The best part of all this is that Linvill and Warren have worked with FiveThirtyEight to publish their entire dataset online through FiveThirtyEight’s GitHub account. And I have uploaded their dataset into the SciServer online science platform. If you’re interested in looking at this data with me, send me an email.

Of course, a dataset is only as useful as the questions that you ask of it. So what can we learn from this one? I have no interest in questions that reduce to “lol Trump voters are stupid” – that is neither useful nor even true. What questions will give us insights into how social media can influence public perception? And what questions will give us insights into how to make sure this doesn’t happen again in the 2018 elections?

Here are a few questions off the top of my head:

  • How did the topics discussed by these troll accounts change after Trump won the election?
  • What strategies did the trolls employ when talking to Democrats?
  • If we identify a control sample of accounts who are genuine Trump supporters (or genuine Black Lives Matter activists, etc.) and blindly run a content analysis, can we tell the difference? If so, how?

What research questions occur to you?

