Blog

Perfect Part 1: How many people alive today have seen a perfect MLB game?

Me in the stands at Safeco Field for Felix Hernandez's perfect game on August 15, 2012Today marks the sixth anniversary of a baseball accomplishment: the most recent perfect game in Major League Baseball history. Since 1876, Major League Baseball teams have played 216,449 games, and only 23 have ended in perfect games (and here is a list of all of them). And I was there for the most recent one.

Which was nice.

A “perfect game” is when a pitcher goes an entire game without allowing any hits, walks, or errors – and thus no one from the opposing team even reaches first base. This photo shows me looking gleeful at the conclusion of the one six years ago. My lovely spouse and I were on vacation in Seattle, and decided to go watch a Seattle Mariners game. Or rather, I wanted to watch the game, so my lovely spouse came along and brought a book to read. (Did I mention my spouse is lovely?)

Someday the baseball gods will extract their terrible revenge on me for all the gloating I have done in the last six years. But the reason I tell the story today is not to further gloat (although: did I mention I saw a perfect game???), but rather to use it as an opportunity to answer a question that my friend Jon posed when he first heard the story:

How many people alive today have seen a perfect game?

The scoreboard at the end of the game reads: "Felix Hernandez, first perfect game in Mariners history"
Did I mention I saw Felix Hernandez’s perfect game six years ago? Which was nice.

So what approach can we take to find an answer to Jon’s question? The question can be divided into two parts: how many people have seen a perfect game, and how many of those people are alive today?

The first part of that question is straightforward, mostly. All MLB games today have their attendance recorded as part of their official scoring, and nearly all baseball scorecards ever are available from the Baseball Reference website. For example, the box score from perfect game #4 – Addie Joss’s in Cleveland on October 2, 1908 – shows that there were 10,598 people in attendance. There are no attendance figures for the first two perfect games, which were both pitched in 1880, but there are for the other 21.

I looked up the attendance from the box scores for each game of the 21 for which figures are available. The total attendance is 570,144. That means that, excluding the first two games in the 1800s (for which the attendance is unknown), 570,144 people have seen an MLB perfect game in person.

So now for the more interesting part: how many of those people are alive today? There’s no way to know for sure, of course – at least not without making the ridiculous effort to track down every person who was there. But we can get a decent estimate based on population and lifespan data. (Note: this is why we don’t need to worry about attendance those first two games – no one born in the 1800s is alive today).

The approach is this: make a guess of who would have been at each game. How many men and how many women? How many people of each age – children, seniors, etc.? Then follow those people into their futures, using population statistics to figure out how many of them are still alive in 2018.

For example: when I saw Felix Hernandez’s perfect game in 2012, I was a 34-year-old male. Now I am a 40-year-old male. But if the perfect game I had seen was instead Cy Young’s in 1904, I would today have been a 148-year-old male, and there are no 148-year-old males alive.

The scorecard for the Tampa Bay Rays in that game: 27 batters, 27 out
My scorecard for the game: F. Hernandez (SEA): 27 batters faced, 27 outs

This approach requires a major starting assumption: what are the demographic characteristics of a Major League Baseball audience? As a starting point, let’s assume that the stadium audience for these perfect games was representative of the whole U.S. population at the time. That is almost certainly not true, but it’s an easy starting point. Toward the end of this post, I give some thoughts on how you might improve estimates.

Next, assuming that the people at each game were representative of the U.S. population, let’s “age the population,” by applying lifespan data for this distribution. The average lifespan in the U.S. today is 76 for men and 81 for women. But, of course, some people live longer than that, and some less long. At first, I thought I would need to calculate simulated lifespans for everyone who might have been in the stadium. For example, if we wanted to see how many people alive today saw Charlie Robertson’s perfect game in 1922, we would need to estimate the likelihood that someone who was there – say a one-year-old baby, is still alive at age 97. But then I realized – no need!

If the research question were about that 1922 game specifically, then we would indeed have to use such a detailed statistical approach. But the research question is about how many people have seen ANY Major League Baseball perfect game. This is where the math gets slightly depressing, sorry, but if that now-97-year-old baby is still alive, s/he is balanced out by someone who saw one of the later games and died young. So we can just use the average lifespan for men and women as an estimate of the probability that someone is still alive – if they would today be older than 76-for-men-81-for-women, we can assume they are no longer with us. That also means that we don’t need to consider games for which the youngest person who might have been there is now older than 81. So we can start with Don Larsen’s perfect game in 1956.

So here is the approach we will use:

  1. We have the total attendance for each game. (It varies from the 6,298 people who saw Catfish Hunter’s perfect game in 1968 to the 64,519 who saw Larsen’s in the 1956 World Series.
  2. Assume that the percentage of people at each age and sex at the game was the same as the percentage of people at each age and sex in the U.S. as a whole. Again, this is almost certainly not true, but I’m not sure how to do better.
  3. All those people are older by the amount of years that have passed since the game. Someone who saw David Wells’s 1998 perfect game at age 48 would be 68 today.
  4. Assume that anyone whose age today turns out to be greater than 76 (for men) or 81 (for women) has gone to the big game in the sky.
  5. Add up the number of people still alive who saw each game to get the total alive across all games. That is the answer to Jon’s question.

So the only additional data we need, besides the attendance for each game, is the percentage of people at each age and sex in the U.S. population at the time each game occurred. We can multiply the percentages of each age and sex by the total number at each game to find the number of people at each age and sex at each game. Data for age and sex for the entire history of the U.S. is easily obtainable from the U.S. Census at www.census.gov.

So what’s the answer?

Tune in Friday to find out!

Except they weren’t: Joe Magarac

A statue of Joe Magarac bending a steel beamI’m fascinated with stories of people who are not what they seem, or are not what they claim to be. History is full of stories of people who hide or change their identities, for all sorts of reasons both good and not-so-good — as well as stories of fictional people who were invented for all sorts of reasons both good and not-so-good. But sometimes the story is so much more interesting, and I’d love to share some of these stories with you. If truth is stranger than fiction, then true fiction is even stranger than that. Welcome to a new occasional series: Except They Weren’t.

What John Henry was to African-American rail workers, Joe Magarac was to steelworkers in Pittsburgh.

In 1931, Pittsburgh author Owen Francis (a former steelworker himself) interviewed a group of immigrant steelworkers from Croatia. They told him of the legend of Joe Magarac, a strong and hardworking folk hero figure who shows up just in time so save his fellow workers from calamity.

Francis wrote up the story for the November 1931 issue of Scribner’s magazine as “The Saga of Joe Magarac: Steelman.” Like the more famous John Henry, Joe Magarac might not have been a real person, but he was a real inspiration to the downtrodden workers of America.

Except he wasn’t.

There is no historical record of Joe Magarac before the 1931 article, and when anthropologists interviewed Pittsburgh immigrant steelworkers in the 1940s and 1950s, no one had heard of him. We’ll never know for sure, but it sure looks like the guys were just playing a fun game of troll the reporter. The clincher? Check out what “magarac” means in Croatian.

A photo of a cute donkey grazingNot Joe Magarac

But in one of those Bizarre Things That Happen Sometimes, the article proved so popular that Joe Magarac actually became a local folk hero, and an unofficial symbol of the city of Pittsburgh. Here he is, hard at work on a mural of a downtown building. And the photo above is of his statue in front of U.S. Steel’s headquarters.

Joe Magarac is an example of what historians and anthropologists call “fakelore” — stories that have the structure and purpose of folklore, but were created for another purpose. What other examples of fakelore do you know?

Image credit for the statue photo: Flickr user Devon Christopher Adams

Grandpa vs. Nazis

A photo of Mike Raddick, Sr. in front of a car wearing a U.S. Army uniform
Detroit, January 1942: On furlough after training

I recently learned a cool story about my grandfather, Michael Raddick, Sr.

In late 1941, like many young men of his generation, he volunteered for the U.S. Army. At age 24 and married, he had never left the American Midwest, and suddenly he was on a train to Camp Beauregard in Louisiana for Basic training. The cool story comes at the end of Basic.

The Sergeant addressed the company and asked who could operate construction equipment. Grandpa raised his hand. Sure, he’d never actually operated construction equipment, but he’d driven cars, and it couldn’t be that different. Right?

Thus he became a member of the Army Corps of Engineers. Suddenly the young man who had never left the Midwest was shipped off to Iran, where he worked on the Trans-Iranian Railway (still in use today) to supply the Soviet Union in its fight against the Nazis. He worked there for three years, met the Shah, and was honorably discharged at the end of the war as a Technician fifth grade (TEC 5), at the time the Engineers’ equivalent of a Corporal. You can read about the operation here at the National Museum of the U.S. Army and at the U.S. Army Corps of Engineers websites.

He returned home and settled in Neville Township, PA, where he became a principal and community leader, and had two children – Aunt Elaine and my Dad. He died in 2005 at age 88, and is buried in Salem, Ohio next to his beloved wife (my grandmother), who died 12 years later.

Headstone reading: Michael Raddick, TEC 5 US Army, World War II, Aug 11 1917 - Sep 24 2005
Rest in peace, Grandpa. You earned it.

This story illustrates what has become something of a trademark strategy in the Raddick family: volunteer for something you are not technically “qualified” for because it sounds cool, then learn fast.

That was how my Dad became a salesman in the lumber industry, and went on to start a successful cabinet supply business.

It’s also how I became a writer/educator/data scientist.

Kill moose and squirrel: Russians pretend to be Americans online

Boris, Natasha, and Fearless Leader from an episode of The Adventures of Rocky and Bullwinkle and Friends“Justin” set his Twitter location as Austin, Texas, but his time zone was set to Moscow Standard Time.

When a Smart Data Science Friend (hi Scott!) shared this in October 2017, I knew that organizations in Russia had mounted used social media to support Donald Trump’s presidential campaign, but I hadn’t realized the scale or effectiveness of their efforts. Looking back, we can see that “Justin” wasn’t alone. How many other Russians were out there on Twitter pretending to be Americans?

I’d like to find out.

Last Wednesday, I wrote about an article published on FiveThirtyEight.com: “Why We’re Sharing 3 Million Russian Troll Tweets. As I said there, I downloaded the dataset that they generously made available on GitHub (technically, I forked the repository into my own GitHub space) and then loaded it into the SciServer online science suite. Let me know if you’d like to join this effort.

I used Python’s regular expression features to do a quick search through 1,849,687 million English-language tweets in the FiveThirtyEight dataset. I looked for tweets that showed evidence of claiming to be Americans – featuring “I’m” or “I am” plus some form of “USA” or “America” or “American”. The screenshot below shows me running that command inside a Jupyter notebook in SciServer Compute:

Python commands using the re and pandas modules
Python commands to find tweets like “I’m American”

The search returned 177 tweets from 84 separate authors – counts that should in no way be considered scientific or used for any kind of analysis, either quantitative or qualitative. I then read through all 177 tweets and selected only those that unambiguously claimed to be American citizens/voters.

I was left with 29 tweets from 20 separate Twitter handles, covering the period from December 2014 to August 2017. (Of course, there is no reason to think that Russians impersonating Americans suddenly stopped in August 2017.) Here are five selected randomly:

  • @ISRAEL_WILLS on February 9, 2015: Hope everyone had a great day yesterday. I’m happy we don’t have a war on the American soil. Thank you to all the military serving today. 🙂
  • @JANI_S_JAC on July 4, 2015: #HappyIndependenceDay I’m a patriot and it’s sad for me to see what’s happening to America today
  • @TEN_GOP on February 3, 2016: If I were a dem I’d be embarrassed by who represents my party. But I could never be a dem. For I’m American! #TCOT
  • @TEN_GOP on March 25, 2016: ‘@COJeepGirl well b/c I’m American and Hussein is the President atm’
  • @JANSKEESTR on August 15, 2017: I voted for Trump because I knew he’s the only man who could save America from liberal degeneracy. I’m still sure that I made a right choice

It’s worth reiterating: all of these people are claiming — directly, unambiguously — to be American citizens. All of them are Russian. It is not illegal in most circumstances to claim that you are someone else online. Nor is it illegal for a foreign citizen to have an opinion on a U.S. election. But I find it profoundly disturbing that we are only now realizing the full extent of these Russian operations. The best we can do right now is to try to understand how these trolls have operated in the past in hopes of preventing similar incidents in the future.

One thing is clear: America has never been so ready for your Rocky & Bullwinkle references. Russian catfisher @TEN_GOP says it best:

Trump was a strong & fearless LEADER today. I’m proud to be an American.

P.S. Here is an Excel spreadsheet containing the tweets that I identified, if you’d like to play with the data yourself.

The wolf should be obvious: why I think we really found water on Mars this time

As I mentioned on Friday, when I first heard about the Italian Space Agency’s announcement of water on Mars, I was skeptical. Various space agencies have cried wolf on major discoveries before – most famously, with “NASA Confirms Evidence That Liquid Water Flows on Today’s Mars (it’s actually sand) and Discovery of “Arsenic-bug” Expands Definition of Life (it wasn’t, and it doesn’t). This is not a conspiracy — it’s just overexcitement. Scientists work hard to keep themselves free of confirmation bias, but they’re still human, and sometimes they still see what they want to see.

Given this history, how do we know that it really is a wolf this time? I’ve found that it helps to ask the obvious question.

Aside… This is what bothers me most about global warming deniers. They will go on for pages and pages about July temperatures in Paraguay, without even trying to answer the obvious question: why did global temperatures start to increase at exactly the time when we started releasing into the atmosphere a gas that is known to increase temperatures?

In the case of water on Mars, here is the obvious question. We know for sure that there is liquid water on one of the nine planets in the Solar System: here on Earth. The research team claims that there is liquid water under the polar ice caps on Mars. Could the same techniques they used have detected water under Earth’s polar ice caps, where we know there is water?

It’s right there in the second sentence of the paper that published the announcement (Orosei et al. 2018): “Radio echo sounding (RES) is a suitable technique to resolve this dispute, because low-frequency radars have been used extensively and successfully to detect liquid water at the bottom of terrestrial polar ice sheets.”

The technique they used is the IN SPAAAAAAAAAAAACE version of a commonly-used technique called ground-penetrating radar (GPR). GPR involves transmitting radio waves into the ground, then listening for the echoes of those waves reflecting off various underground layers. The strength of the return signals reflected off each layer tells you what the layer is made of, and nothing reflects quite like water. And that water-related pattern is exactly that kind of reflection that the research team saw on Mars.

The radar image that proves there is water under Mars's south polar cap; it shows up as an underground layer that strongly reflects radio waves
(A) The radar reflection profile found by Mars Express. “Surface reflection” shows the radio waves reflecting off Mars’s surface, while “Basal reflection” shows the radio waves reflecting off the water layer

(B) The same reflection measurements shown as a more traditional graph.

Source: Orosei et al. 2018. Click on the image for a larger version.

Obvious question answered, wolf found. We really did it this time!

We did it! We found water on Mars!

We found water on Mars!

We found water on Mars!

We found water on Mars!

(“We” = humans)

I’ll admit that when I first heard the news, I was skeptical. Although we try to avoid it, scientists can sometimes fall victim to their own wishful thinking just like anyone else can. But I read the report, and the evidence is solid. We really did it this time.

We’ve known for a while that there is H2O on Mars, as water vapor in the atmosphere and as layers of dust and ice near the north and south poles. The question was whether we could find liquid water.

The answer came from the European Space Agency’s Mars Express spacecraft. It carries a radar instrument that broadcasts radio waves at the Martian surface and listens for those waves reflecting back from layers under the surface. When the spacecraft flew over a region near Mars’s south pole – shown below – it picked up echoes from a layer buried below the surface.

The radar echo was so strong that it could only be one thing: liquid water.

(A) A map of the study area, near Mars’s south pole. (B) A close-up of the area in the black box – the red line shows the track of the spacecraft. Source: Orosei et al. 2018. Click on the image for a larger version.

There’s a lot more to say about this discovery, but first:

WOW!