Perfect Part 1: How many people alive today have seen a perfect MLB game?

Me in the stands at Safeco Field for Felix Hernandez's perfect game on August 15, 2012Today marks the sixth anniversary of a baseball accomplishment: the most recent perfect game in Major League Baseball history. Since 1876, Major League Baseball teams have played 216,449 games, and only 23 have ended in perfect games (and here is a list of all of them). And I was there for the most recent one.

Which was nice.

A “perfect game” is when a pitcher goes an entire game without allowing any hits, walks, or errors – and thus no one from the opposing team even reaches first base. This photo shows me looking gleeful at the conclusion of the one six years ago. My lovely spouse and I were on vacation in Seattle, and decided to go watch a Seattle Mariners game. Or rather, I wanted to watch the game, so my lovely spouse came along and brought a book to read. (Did I mention my spouse is lovely?)

Someday the baseball gods will extract their terrible revenge on me for all the gloating I have done in the last six years. But the reason I tell the story today is not to further gloat (although: did I mention I saw a perfect game???), but rather to use it as an opportunity to answer a question that my friend Jon posed when he first heard the story:

How many people alive today have seen a perfect game?

The scoreboard at the end of the game reads: "Felix Hernandez, first perfect game in Mariners history"
Did I mention I saw Felix Hernandez’s perfect game six years ago? Which was nice.

So what approach can we take to find an answer to Jon’s question? The question can be divided into two parts: how many people have seen a perfect game, and how many of those people are alive today?

The first part of that question is straightforward, mostly. All MLB games today have their attendance recorded as part of their official scoring, and nearly all baseball scorecards ever are available from the Baseball Reference website. For example, the box score from perfect game #4 – Addie Joss’s in Cleveland on October 2, 1908 – shows that there were 10,598 people in attendance. There are no attendance figures for the first two perfect games, which were both pitched in 1880, but there are for the other 21.

I looked up the attendance from the box scores for each game of the 21 for which figures are available. The total attendance is 570,144. That means that, excluding the first two games in the 1800s (for which the attendance is unknown), 570,144 people have seen an MLB perfect game in person.

So now for the more interesting part: how many of those people are alive today? There’s no way to know for sure, of course – at least not without making the ridiculous effort to track down every person who was there. But we can get a decent estimate based on population and lifespan data. (Note: this is why we don’t need to worry about attendance those first two games – no one born in the 1800s is alive today).

The approach is this: make a guess of who would have been at each game. How many men and how many women? How many people of each age – children, seniors, etc.? Then follow those people into their futures, using population statistics to figure out how many of them are still alive in 2018.

For example: when I saw Felix Hernandez’s perfect game in 2012, I was a 34-year-old male. Now I am a 40-year-old male. But if the perfect game I had seen was instead Cy Young’s in 1904, I would today have been a 148-year-old male, and there are no 148-year-old males alive.

The scorecard for the Tampa Bay Rays in that game: 27 batters, 27 out
My scorecard for the game: F. Hernandez (SEA): 27 batters faced, 27 outs

This approach requires a major starting assumption: what are the demographic characteristics of a Major League Baseball audience? As a starting point, let’s assume that the stadium audience for these perfect games was representative of the whole U.S. population at the time. That is almost certainly not true, but it’s an easy starting point. Toward the end of this post, I give some thoughts on how you might improve estimates.

Next, assuming that the people at each game were representative of the U.S. population, let’s “age the population,” by applying lifespan data for this distribution. The average lifespan in the U.S. today is 76 for men and 81 for women. But, of course, some people live longer than that, and some less long. At first, I thought I would need to calculate simulated lifespans for everyone who might have been in the stadium. For example, if we wanted to see how many people alive today saw Charlie Robertson’s perfect game in 1922, we would need to estimate the likelihood that someone who was there – say a one-year-old baby, is still alive at age 97. But then I realized – no need!

If the research question were about that 1922 game specifically, then we would indeed have to use such a detailed statistical approach. But the research question is about how many people have seen ANY Major League Baseball perfect game. This is where the math gets slightly depressing, sorry, but if that now-97-year-old baby is still alive, s/he is balanced out by someone who saw one of the later games and died young. So we can just use the average lifespan for men and women as an estimate of the probability that someone is still alive – if they would today be older than 76-for-men-81-for-women, we can assume they are no longer with us. That also means that we don’t need to consider games for which the youngest person who might have been there is now older than 81. So we can start with Don Larsen’s perfect game in 1956.

So here is the approach we will use:

  1. We have the total attendance for each game. (It varies from the 6,298 people who saw Catfish Hunter’s perfect game in 1968 to the 64,519 who saw Larsen’s in the 1956 World Series.
  2. Assume that the percentage of people at each age and sex at the game was the same as the percentage of people at each age and sex in the U.S. as a whole. Again, this is almost certainly not true, but I’m not sure how to do better.
  3. All those people are older by the amount of years that have passed since the game. Someone who saw David Wells’s 1998 perfect game at age 48 would be 68 today.
  4. Assume that anyone whose age today turns out to be greater than 76 (for men) or 81 (for women) has gone to the big game in the sky.
  5. Add up the number of people still alive who saw each game to get the total alive across all games. That is the answer to Jon’s question.

So the only additional data we need, besides the attendance for each game, is the percentage of people at each age and sex in the U.S. population at the time each game occurred. We can multiply the percentages of each age and sex by the total number at each game to find the number of people at each age and sex at each game. Data for age and sex for the entire history of the U.S. is easily obtainable from the U.S. Census at www.census.gov.

So what’s the answer?

Tune in Friday to find out!

The wolf should be obvious: why I think we really found water on Mars this time

As I mentioned on Friday, when I first heard about the Italian Space Agency’s announcement of water on Mars, I was skeptical. Various space agencies have cried wolf on major discoveries before – most famously, with “NASA Confirms Evidence That Liquid Water Flows on Today’s Mars (it’s actually sand) and Discovery of “Arsenic-bug” Expands Definition of Life (it wasn’t, and it doesn’t). This is not a conspiracy — it’s just overexcitement. Scientists work hard to keep themselves free of confirmation bias, but they’re still human, and sometimes they still see what they want to see.

Given this history, how do we know that it really is a wolf this time? I’ve found that it helps to ask the obvious question.

Aside… This is what bothers me most about global warming deniers. They will go on for pages and pages about July temperatures in Paraguay, without even trying to answer the obvious question: why did global temperatures start to increase at exactly the time when we started releasing into the atmosphere a gas that is known to increase temperatures?

In the case of water on Mars, here is the obvious question. We know for sure that there is liquid water on one of the nine planets in the Solar System: here on Earth. The research team claims that there is liquid water under the polar ice caps on Mars. Could the same techniques they used have detected water under Earth’s polar ice caps, where we know there is water?

It’s right there in the second sentence of the paper that published the announcement (Orosei et al. 2018): “Radio echo sounding (RES) is a suitable technique to resolve this dispute, because low-frequency radars have been used extensively and successfully to detect liquid water at the bottom of terrestrial polar ice sheets.”

The technique they used is the IN SPAAAAAAAAAAAACE version of a commonly-used technique called ground-penetrating radar (GPR). GPR involves transmitting radio waves into the ground, then listening for the echoes of those waves reflecting off various underground layers. The strength of the return signals reflected off each layer tells you what the layer is made of, and nothing reflects quite like water. And that water-related pattern is exactly that kind of reflection that the research team saw on Mars.

The radar image that proves there is water under Mars's south polar cap; it shows up as an underground layer that strongly reflects radio waves
(A) The radar reflection profile found by Mars Express. “Surface reflection” shows the radio waves reflecting off Mars’s surface, while “Basal reflection” shows the radio waves reflecting off the water layer

(B) The same reflection measurements shown as a more traditional graph.

Source: Orosei et al. 2018. Click on the image for a larger version.

Obvious question answered, wolf found. We really did it this time!

We did it! We found water on Mars!

We found water on Mars!

We found water on Mars!

We found water on Mars!

(“We” = humans)

I’ll admit that when I first heard the news, I was skeptical. Although we try to avoid it, scientists can sometimes fall victim to their own wishful thinking just like anyone else can. But I read the report, and the evidence is solid. We really did it this time.

We’ve known for a while that there is H2O on Mars, as water vapor in the atmosphere and as layers of dust and ice near the north and south poles. The question was whether we could find liquid water.

The answer came from the European Space Agency’s Mars Express spacecraft. It carries a radar instrument that broadcasts radio waves at the Martian surface and listens for those waves reflecting back from layers under the surface. When the spacecraft flew over a region near Mars’s south pole – shown below – it picked up echoes from a layer buried below the surface.

The radar echo was so strong that it could only be one thing: liquid water.

(A) A map of the study area, near Mars’s south pole. (B) A close-up of the area in the black box – the red line shows the track of the spacecraft. Source: Orosei et al. 2018. Click on the image for a larger version.

There’s a lot more to say about this discovery, but first:

WOW!

A new vision for science

My colleagues and I are thrilled to announce the latest release of our SciServer online science platform.

Screenshot of SciServer (www.sciserver.org)

SciServer a suite of tools to manage, visualize, and understand large-scale datasets in all areas of science, from astronomy to genomics to soil ecology. SciServer allows anyone to work with Terabytes of data, running server-side analysis and visualization tools in real time, without needed to install anything.

The beating heart of SciServer is SciServer Compute, a browser-based virtual computing environment. Anyone can create a free SciServer account and create analysis scripts in Python, R, or Matlab.

Today’s release is called SciServer Betelgeuse, succeeding the previous system SciServer Altair (#lolSeeWhatWeDidThere). SciServer Betelgeuse adds group functionality for file and data sharing, and also the ability to run asynchronous time- or memory-intensive jobs. We’ve been working on this update for more than two years, and we’re eager to see how everyone can make use of it.

We’re grateful to the generosity of the National Science Foundation (award ACI-1261715) for their generosity in allowing us to create and maintain this resource, forever free to users.

The “we” I keep referring to here is a team of incredibly talented scientists and coders at the Institute for Data-Intensive Engineering and Science (IDIES) at Johns Hopkins University. I’m honored to have been part of this team for the past eighteen years.

And on a personal note, this new release is a major new step in my career. I’ve devoted my entire professional life to finding new ways to bring the real process of science to the world, and this is the realest real way yet.

The problem any third grader can understand, but has all the world’s mathematicians completely stumped

Think of any natural number (1, 2, 3, 4…). If your number is even, divide by 2. If your number is odd, multiply by 3 and add 1. Repeat with your new number, over and over again.

For any everyday human-scale number, you will eventually end up in an infinite loop of 4, 2, 1, 4, 2, 1… Sometimes you’ll get there quickly, sometimes it will take hundreds of steps.

Collatz conjecture with 42: 42 -> 21 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1 -> 4 -> 2 -> 1...
The Collatz Conjecture: example calculations for 42 and 99

Here are two examples I tried on the back of a random piece of junk mail: 42 goes into the 4 -> 2 -> 1 loop on the sixth step, while 99 goes on for so long that I ran out of space. Try it yourself with a few numbers to get a feeling for how it works. If you get tired of writing down numbers, this online tool will do the calculations for you.

But of course there are infinitely many numbers… so is that ALWAYS the case, or is there some number that defies the pattern? An exception could either end in a different repeating cycle, or could keep growing forever by multiplying by 3 and adding 1.

This is known as the “3x+1 problem” or the “Collatz Conjecture” (after German mathematician Lothar Collatz). Amazingly enough, this simple problem has never been solved.

Computer calculations show that if there is an exception to the rule, it must be larger than 1,152,921,504,606,846,976. We could keep testing larger and larger numbers on larger and larger supercomputers, and maybe we’d find an exception… but maybe not. Even if we supercompute for ten billion years and still don’t find a counterexample, that’s still not a proof. A proof would require someone to construct a logical argument, starting from things we know to be true, to conclude either that an exception MUST exist or an exception CANNOT exist. (Caveat about this: see below.*) And mathematicians don’t even know where to begin to build that argument (another caveat**).

Probably the world’s top expert on this problem is Jeffrey Lagarias of the University of Michigan, who says: “This is an extraordinarily difficult problem, completely out of reach of present day mathematics.”

Or, put more simply: Math is AWESOME. That is going to be a theme of this blog, I think.

To learn more about the Collatz Conjecture, see this excellent introduction from one of my favorite YouTube channels, Numberphile:

[youtube https://www.youtube.com/watch?v=5mFpVDpKX70&w=560&h=315]

*There is another possibility – it could be “undecidable,” meaning we could prove that the statement could never be proven true or false from the basic assumptions of math. It would be sorta like the statement in simple English, “This statement is a lie.”

**I’m exaggerating a bit, of course, because mumble mumble abstract machine mumble subsemigroup mumble parity sequence mumble mumble matrix something something mumble. But there’s no obvious path forward.