How do you influence an election?

One of the most fascinating stories of the 2016 U.S. Presidential election was the story of how a well-planned social media campaign based in Russia may (or may not?) have influenced the result.

There is now no doubt that this campaign existed, according to multiple reliable sources. And the fact that we had no idea at the time should make us very, very worried.

If we didn’t know it at the time, can we at least look back with hindsight understand how it happened? That’s the idea behind a new analysis, published yesterday on my favorite source for news and analysis,

The article describes the research of two professors at Clemson University, Darren Linvill and Patrick Warren. They used Clemson’s Social Media Listening Center to recover tweets from 3,841 Twitter handles associated with the Internet Research Agency, the most prominent of the Russia-based organizations accused of creating fake accounts to influence the election. Their dataset covers the period from June 2015 to December 2017, and includes nearly three million tweets.

The result of the two researchers’ work is a preprint called “Troll Factories: The Internet Research Agency and State-Sponsored Agenda Building,” currently undergoing peer review (PDF available on Warren’s website).

The image below, from the FiveThirtyEight article, shows how the number of tweets from these accounts varies with time.


The best part of all this is that Linvill and Warren have worked with FiveThirtyEight to publish their entire dataset online through FiveThirtyEight’s GitHub account. And I have uploaded their dataset into the SciServer online science platform. If you’re interested in looking at this data with me, send me an email.

Of course, a dataset is only as useful as the questions that you ask of it. So what can we learn from this one? I have no interest in questions that reduce to “lol Trump voters are stupid” – that is neither useful nor even true. What questions will give us insights into how social media can influence public perception? And what questions will give us insights into how to make sure this doesn’t happen again in the 2018 elections?

Here are a few questions off the top of my head:

  • How did the topics discussed by these troll accounts change after Trump won the election?
  • What strategies did the trolls employ when talking to Democrats?
  • If we identify a control sample of accounts who are genuine Trump supporters (or genuine Black Lives Matter activists, etc.) and blindly run a content analysis, can we tell the difference? If so, how?

What research questions occur to you?

In Russia, Buddha meditates on you!

The world’s most unexpected Buddhist temple:

This is the Golden Temple of Elista, the capital of the Russian Republic of Kalmykia, on the western shore of the Caspian Sea in European Russia. The Kalmyks migrated here in the 1600s from what is now Mongolia.

Kalmykia is thus the only place outside Asia where the predominant religion is Buddhism.

Explore it yourself!

Bonus entertainment: a full-on flame war in reddit’s r/buddhism: “You must be new here. You don’t want to try to debate with me… You are not an awakened being.”

A new vision for science

My colleagues and I are thrilled to announce the latest release of our SciServer online science platform.

Screenshot of SciServer (

SciServer a suite of tools to manage, visualize, and understand large-scale datasets in all areas of science, from astronomy to genomics to soil ecology. SciServer allows anyone to work with Terabytes of data, running server-side analysis and visualization tools in real time, without needed to install anything.

The beating heart of SciServer is SciServer Compute, a browser-based virtual computing environment. Anyone can create a free SciServer account and create analysis scripts in Python, R, or Matlab.

Today’s release is called SciServer Betelgeuse, succeeding the previous system SciServer Altair (#lolSeeWhatWeDidThere). SciServer Betelgeuse adds group functionality for file and data sharing, and also the ability to run asynchronous time- or memory-intensive jobs. We’ve been working on this update for more than two years, and we’re eager to see how everyone can make use of it.

We’re grateful to the generosity of the National Science Foundation (award ACI-1261715) for their generosity in allowing us to create and maintain this resource, forever free to users.

The “we” I keep referring to here is a team of incredibly talented scientists and coders at the Institute for Data-Intensive Engineering and Science (IDIES) at Johns Hopkins University. I’m honored to have been part of this team for the past eighteen years.

And on a personal note, this new release is a major new step in my career. I’ve devoted my entire professional life to finding new ways to bring the real process of science to the world, and this is the realest real way yet.

What’s North of South Dakota?

What’s North of South Dakota? SURPRISE!

The answer is mostly obvious: North Dakota. But there is one tiny, bizarre exception, which I found through Google Earth.

The world is full of amazing and beautiful surprises, and I’m pretty sure that over the years I have spent more time playing with Google Earth than with any actual computer “game.” The browser-based Earth-in-Google-Maps interface is easy to use, but the downloadable Google Earth Pro has clearer images and additional tools like distance measurements and geotagged forum posts.

One day I was looking up the Sanford Underground Research Facility in western South Dakota, and decided to scroll around for a bit. I discovered, to my great suprise, that the borders of Montana and Wyoming don’t quite line up – leaving a less-than-one-mile-long anomaly in the South Dakota border. This means that if you drive north on Albion Road outside of Belle Fourche, you will cross the border into Montana.

And here it is, with the border clearly marked:

A satellite image of the short South Dakota-Montana border

At first I thought it was a copyright trap, but Google Earth came to the rescue by showing that someone had taken a photo of a “Welcome to Montana” sign just over the border. Sadly, the photo was hosted on the now-defunct Panoramio site, so it’s gone. But you can still see the shadow of the border sign in the close-up satellite image:

A closeup image of the border, showing the shadow of the Welcome to Montana sign

Bonus awesomeness: Driving north on Albion Road also takes you past two derelict nuclear missile silos from the Cold War. And also two other sites that are clearly still in use but completely unlabeled. See if you can find them!

If you’d like to explore for yourself (and you should!), here is the direct link in Google Maps – or download Google Earth Pro, turn on the Borders layer, head northwest from Belle Fourche, South Dakota.

Happy virtual travels!

The problem any third grader can understand, but has all the world’s mathematicians completely stumped

Think of any natural number (1, 2, 3, 4…). If your number is even, divide by 2. If your number is odd, multiply by 3 and add 1. Repeat with your new number, over and over again.

For any everyday human-scale number, you will eventually end up in an infinite loop of 4, 2, 1, 4, 2, 1… Sometimes you’ll get there quickly, sometimes it will take hundreds of steps.

Collatz conjecture with 42: 42 -> 21 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1 -> 4 -> 2 -> 1...
The Collatz Conjecture: example calculations for 42 and 99

Here are two examples I tried on the back of a random piece of junk mail: 42 goes into the 4 -> 2 -> 1 loop on the sixth step, while 99 goes on for so long that I ran out of space. Try it yourself with a few numbers to get a feeling for how it works. If you get tired of writing down numbers, this online tool will do the calculations for you.

But of course there are infinitely many numbers… so is that ALWAYS the case, or is there some number that defies the pattern? An exception could either end in a different repeating cycle, or could keep growing forever by multiplying by 3 and adding 1.

This is known as the “3x+1 problem” or the “Collatz Conjecture” (after German mathematician Lothar Collatz). Amazingly enough, this simple problem has never been solved.

Computer calculations show that if there is an exception to the rule, it must be larger than 1,152,921,504,606,846,976. We could keep testing larger and larger numbers on larger and larger supercomputers, and maybe we’d find an exception… but maybe not. Even if we supercompute for ten billion years and still don’t find a counterexample, that’s still not a proof. A proof would require someone to construct a logical argument, starting from things we know to be true, to conclude either that an exception MUST exist or an exception CANNOT exist. (Caveat about this: see below.*) And mathematicians don’t even know where to begin to build that argument (another caveat**).

Probably the world’s top expert on this problem is Jeffrey Lagarias of the University of Michigan, who says: “This is an extraordinarily difficult problem, completely out of reach of present day mathematics.”

Or, put more simply: Math is AWESOME. That is going to be a theme of this blog, I think.

To learn more about the Collatz Conjecture, see this excellent introduction from one of my favorite YouTube channels, Numberphile:

*There is another possibility – it could be “undecidable,” meaning we could prove that the statement could never be proven true or false from the basic assumptions of math. It would be sorta like the statement in simple English, “This statement is a lie.”

**I’m exaggerating a bit, of course, because mumble mumble abstract machine mumble subsemigroup mumble parity sequence mumble mumble matrix something something mumble. But there’s no obvious path forward.