Today marks the sixth anniversary of a baseball accomplishment: the most recent perfect game in Major League Baseball history. Since 1876, Major League Baseball teams have played 216,449 games, and only 23 have ended in perfect games (and here is a list of all of them). And I was there for the most recent one.
A “perfect game” is when a pitcher goes an entire game without allowing any hits, walks, or errors – and thus no one from the opposing team even reaches first base. This photo shows me looking gleeful at the conclusion of the one six years ago. My lovely spouse and I were on vacation in Seattle, and decided to go watch a Seattle Mariners game. Or rather, I wanted to watch the game, so my lovely spouse came along and brought a book to read. (Did I mention my spouse is lovely?)
Someday the baseball gods will extract their terrible revenge on me for all the gloating I have done in the last six years. But the reason I tell the story today is not to further gloat (although: did I mention I saw a perfect game???), but rather to use it as an opportunity to answer a question that my friend Jon posed when he first heard the story:
How many people alive today have seen a perfect game?
So what approach can we take to find an answer to Jon’s question? The question can be divided into two parts: how many people have seen a perfect game, and how many of those people are alive today?
The first part of that question is straightforward, mostly. All MLB games today have their attendance recorded as part of their official scoring, and nearly all baseball scorecards ever are available from the Baseball Reference website. For example, the box score from perfect game #4 – Addie Joss’s in Cleveland on October 2, 1908 – shows that there were 10,598 people in attendance. There are no attendance figures for the first two perfect games, which were both pitched in 1880, but there are for the other 21.
I looked up the attendance from the box scores for each game of the 21 for which figures are available. The total attendance is 570,144. That means that, excluding the first two games in the 1800s (for which the attendance is unknown), 570,144 people have seen an MLB perfect game in person.
So now for the more interesting part: how many of those people are alive today? There’s no way to know for sure, of course – at least not without making the ridiculous effort to track down every person who was there. But we can get a decent estimate based on population and lifespan data. (Note: this is why we don’t need to worry about attendance those first two games – no one born in the 1800s is alive today).
The approach is this: make a guess of who would have been at each game. How many men and how many women? How many people of each age – children, seniors, etc.? Then follow those people into their futures, using population statistics to figure out how many of them are still alive in 2018.
For example: when I saw Felix Hernandez’s perfect game in 2012, I was a 34-year-old male. Now I am a 40-year-old male. But if the perfect game I had seen was instead Cy Young’s in 1904, I would today have been a 148-year-old male, and there are no 148-year-old males alive.
This approach requires a major starting assumption: what are the demographic characteristics of a Major League Baseball audience? As a starting point, let’s assume that the stadium audience for these perfect games was representative of the whole U.S. population at the time. That is almost certainly not true, but it’s an easy starting point. Toward the end of this post, I give some thoughts on how you might improve estimates.
Next, assuming that the people at each game were representative of the U.S. population, let’s “age the population,” by applying lifespan data for this distribution. The average lifespan in the U.S. today is 76 for men and 81 for women. But, of course, some people live longer than that, and some less long. At first, I thought I would need to calculate simulated lifespans for everyone who might have been in the stadium. For example, if we wanted to see how many people alive today saw Charlie Robertson’s perfect game in 1922, we would need to estimate the likelihood that someone who was there – say a one-year-old baby, is still alive at age 97. But then I realized – no need!
If the research question were about that 1922 game specifically, then we would indeed have to use such a detailed statistical approach. But the research question is about how many people have seen ANY Major League Baseball perfect game. This is where the math gets slightly depressing, sorry, but if that now-97-year-old baby is still alive, s/he is balanced out by someone who saw one of the later games and died young. So we can just use the average lifespan for men and women as an estimate of the probability that someone is still alive – if they would today be older than 76-for-men-81-for-women, we can assume they are no longer with us. That also means that we don’t need to consider games for which the youngest person who might have been there is now older than 81. So we can start with Don Larsen’s perfect game in 1956.
So here is the approach we will use:
- We have the total attendance for each game. (It varies from the 6,298 people who saw Catfish Hunter’s perfect game in 1968 to the 64,519 who saw Larsen’s in the 1956 World Series.
- Assume that the percentage of people at each age and sex at the game was the same as the percentage of people at each age and sex in the U.S. as a whole. Again, this is almost certainly not true, but I’m not sure how to do better.
- All those people are older by the amount of years that have passed since the game. Someone who saw David Wells’s 1998 perfect game at age 48 would be 68 today.
- Assume that anyone whose age today turns out to be greater than 76 (for men) or 81 (for women) has gone to the big game in the sky.
- Add up the number of people still alive who saw each game to get the total alive across all games. That is the answer to Jon’s question.
So the only additional data we need, besides the attendance for each game, is the percentage of people at each age and sex in the U.S. population at the time each game occurred. We can multiply the percentages of each age and sex by the total number at each game to find the number of people at each age and sex at each game. Data for age and sex for the entire history of the U.S. is easily obtainable from the U.S. Census at www.census.gov.
So what’s the answer?
Tune in Friday to find out!