I had a whole post written and ready to go. It was beautiful – personal, deeply emotional, and hopefully even a bit insightful. But, oh, then I saw this tweet.
In case you’ve been living under a tea tree plant — or just don’t follow American Rules Football — here’s the backstory. Colin Kaepernick is a former player who ignited a huge controversy during the 2016 preseason by kneeling during traditional pregame performance of The Star-Spangled Banner as a protest against racism and police brutality in the United States. Some saw it as a brave stand against injustice, some saw it as disrespectful to the sacrifice of U.S. troops.
Of course, the protest has spread far beyond Kaepernick, with more than 200 players joining in at various times. So it’s not just about one person, and it hasn’t been for a while. But Kaepernick has always been the public face of the protests, and the flashpoint of controversy. If you want to see rageful opinions flying from all directions, especially from conservatives, go to an NFL fan forum and just post “Colin Kaepernick” and nothing else.
After the 2016 season, Kaepernick elected to forgo his contract and become a free agent, eligible to be hired by other teams – and no other teams hired him. In November 2017, he filed a grievance against the NFL, claiming that owners colluded to keep him out of the league. On August 28, 2018, an arbitrator ruled that the case could go to trial.
Today’s announcement means that Kaepernick will be the new face of Nike’s ad campaign. He won’t be the only athlete featured, but he’ll be the most prominent. I think the campaign’s tagline is particularly poignant:
Believe in something. Even if it means sacrificing everything.
But of course, Kaepernick’s opponents would counter that “sacrificing everything” can mean more than a career as a football player.
So how should we react to today’s announcement?
First, let me be clear: I have supported the player’s right to protest, and I believe the issues they are protesting against are real and troubling to this country which I love. My thought today is:
In my mind, it all depends on exactly what Darren Rovell means by “Nike has been paying Colin Kaepernick all along.” Do they mean since it became clear he would not be signed by another team? Since the protests came to national attention? Or… do they mean since before the first time he kneeled?
This is either an incredibly brave stand by Nike… or it’s the most cynical PR stunt in the history of PR. And right now we don’t have enough information to know which it is.
Last Wednesday, I continued my series on People Who Are Not What They Seem with Malba Tahan, the Islamic intellectual and writer who was actually the fictional alter ego of a Brazilian math teacher, created to help his students learn word problems.
That post ended on a mathematical cliffhanger (bwahahahaha!). Our heroes, Tahan and his friend Beremiz Samir, happened upon an argument among three brothers about how to divide up their father’s inheritance. Their father left them a herd of 35 camels, with the following instructions (as reported by the eldest brother):
According to the express wishes of my father half of them belong to me, one-third to my brother Hamed, and one-ninth to Harim, the youngest.
Except They Weren’t:
An occasional series about people who are Not What They Seem
Samir’s solution was clever, but it required some risk – he added into the herd the camel that Tahan was riding, making a new herd of 36. He then divided the new herd according to the father’s instructions: one-half (18) to the eldest, one-third (12) to the middle, and one-ninth (4) to the youngest. All three brothers were satisfied with this arrangement, which left two camels remaining. One, of course, was Tahan’s that had been added at the beginning. Samir requested the other as his payment for arranging this solution – and since all three brothers were satisfied, they agreed. Samir grabbed the strongest, most beautiful member of the herd, and the pair rode off together into the sunset.
It’s a happy ending. Everyone is satisfied, especially our heroes. And you have to admire Samir’s Raven-level trickeration in getting something for nothing. But how did he solve the problem?
When faced with a word problem, often the best first step is to write down what you know and what you want to find out. Before Tahan and Samir arrive, here is the situation the brothers face:
What we know
Total camels: 35
Fraction to each brother: eldest: $latex \frac{1}{2}&s=3$, middle: $latex \frac{1}{3}&s=3$, youngest: $latex \frac{1}{9}&s=3$
What we want to find out
How many camels should each brother get?
In theory, this should be an easy problem: for the eldest brother, divide 35 by 2, and repeat for the others. Thus, the eldest brother should get 17 $latex \frac{1}{2}$ camels — not too pleasant for the camel! And besides, half a camel is not that useful anyway. Clearly a better solution is needed.
Tahan and Samir arrive, Samir offers Tahan’s camel for the herd, and the problem changes. Now we have:
What we know
Total camels: 36
Fraction to each brother: eldest: $latex \frac{1}{2}&s=3$, middle: $latex \frac{1}{3}&s=3$, youngest: $latex \frac{1}{9}&s=3$
What we want to find out
How many camels should each brother get?
Now we’re getting somewhere.
36 divided by 2 is 18, 36 divided by 3 is 12, and 36 divided by 9 is 4. Thus, the three brothers get eighteen, twelve, and four camels, all of which give them full camels instead of useless fractional camels.*
Adding up all three brothers’ camelshare gives 18 + 12 + 4 = 34 camels, with two remaining from the herd. One was Tahan’s, one is now Samir’s. Everything is A-OK.
But where did that extra camel come from?
Re-read the father’s instructions again, carefully:
According to the express wishes of my father half of them belong to me, one-third to my brother Hamed, and one-ninth to Harim, the youngest.
At this stage, there are two ways to approach the problem. The slightly easier way is to convert the fractional shares. You can always multiply the top (numerator) of a fraction by any number, and the bottom (denominator) by the same number, and the fraction will be the same. One-half ($latex \frac{1}{2}$) is the same as two-fourths ($latex \frac{2}{4}$). So, let’s multiply each fractional camelshare by the number of camels, which is now 36. Thus, the father’s instructions now read:
According to the express wishes of my father $latex \frac{18}{36}$ of them belong to me, $latex \frac{12}{36}$ to my brother Hamed, and $latex \frac{4}{36}$ to Harim, the youngest.
Or, if you prefer, you can convert the fractions to percentages (rounded to the nearest tenth of a percent):
According to the express wishes of my father 50% of them belong to me, 33.3% to my brother Hamed, and 11.1% to Harim, the youngest.
Either way, it quickly becomes clear: the father’s will was incomplete! The percentages don’t add up to 100%, so no matter how many camels were in the herd, some would be left over after the division.
The world’s most unexpected border: these two countries seem to be a world apart, but high in the Hindu Kush mountains is a curvy 50-mile border between Afghanistan and China.
The only sane place to cross is the illegal unmarked border at Wakhjir Pass (15,780 feet), used only by the occasional drug smuggler:
This is also the largest time zone jump in the world (excluding the International Date Line) – cross into China at noon, and suddenly it’s 3:30 PM.
Another news day, another insight into the role of social media in the 2016 election cycle. I’ve written before about the role that fake social media accounts originating in Russia played in providing support for Donald Trump and other candidates. Unsurprisingly — at least in retrospect — those social media efforts went beyond support for specific candidates.
In a new study published today (Broniatowski et al., 2018) and reported on bymanyreliablemediaoutlets, researchers at George Washington University studied three years’ worth of tweets containing keywords related to vaccines.
The reason that vaccines are important is that there is an ongoing debate in the U.S. about whether vaccines cause autism (they don’t), whether the initial study saying they did was a deliberate fraud (it was), and whether parents should vaccinate their children (they should). I have a lot of sympathy for parents who are reluctant about vaccinating their children — injecting your children with known disease-causing agents is undeniably creepy. But it works, to protect them and to protect other children too.
I haven’t read the full paper yet, but the bottom line is that during the period from 2014 to 2017 (and of course continuing through to today), Russian agents used (and continue to use) the vaccine debate to shift how Americans talk about social and political issues. But there is a fascinating difference between the candidate-based study I wrote about before and this one:
When discussing vaccines, the Russian trolls took both sides in the debate.
They didn’t care.
This at last provides some insight into why the Russians have invested time and energy into running social media campaigns in the U.S. They are seeking to divide our country, because a divided country is a weak country.
Don’t let them do it. Find someone who disagrees with you and talk to them, right now.
Did I mention that I was in the stands for Felix Hernandez’s perfect game?[youtube https://www.youtube.com/watch?v=43ICt9_W2Y4&w=560&h=315]
Oh, I did? Cool.
On Wednesday, I asked a question that my friend Jon had asked me soon after that 2012 game: How many people alive today have seen a perfect Major League Baseball game in person?
In Wednesday’s post, I outlined an approach to solving this seemingly-difficult problem. To repeat:
We have the total attendance for each game. (It varies from the 6,298 people who saw Catfish Hunter’s perfect game in 1968 to the 64,519 who saw Don Larsen’s in the 1956 World Series.
Assume that the percentage of people at each age and sex at the game was the same as the percentage of people at each age and sex in the U.S. as a whole. Again, this is almost certainly not true, but I’m not sure how to do better.
All those people are older by the amount of years that have passed since the game. Someone who saw David Wells’s 1998 perfect game at age 48 would be 68 today.
Assume that anyone whose age today turns out to be greater than 76 (for men) or 81 (for women) has gone to the big game in the sky.
Add up the number of people still alive who saw each game to get the total alive across all games. That is the answer to Jon’s question.
Additional step! This extra step was suggested by my other friend Ed, who asked “What about the weirdos that have seen more than one perfect game?” Are there cases where adding up the attendance would double-count some people, and if so, how do we account for that?
I closed Wendesday’s post asking how to get the data we would need for this approach. We can get attendance figures from the Wikipedia article on MLB perfect games, and I suggested that we should be able to find demographic data (age and sex) from the U.S. Census. Did anyone think of how to find that data? No one commented, but that doesn’t mean you weren’t thinking about it.
From the census data: number of men (blue) and women (orange) living in the U.S. in each decade from 1900 to 2010. Click for a larger version.
BONUS QUESTION: Notice how the number of men and women is about 50/50 until 1950, and after that, women exceed men. Why is that?
Of course, the U.S. census is only done every 10 years, and perfect games occurred in interim years like 1965 and 2004. So to get the age/sex populations in those interim years, I interpolated between the once-a-decade censuses by assuming a linear population growth, with slope (P2 – P1) / 10 for each of the categories. Note that this assumes that populations shifted within categories only every 10 years, which is not a good assumption but is good enough for this quick analysis. For games in 2011 and 2012, I carried the 2000-2010 growth rate forward another two years.
The census doesn’t publish population counts for every age; instead, they report a single count for all ages within five-year intervals (bins), starting with “0-5” and ending with “85+”. Within each age bin, they report the number of men and the number of women. They also give the total number of men and women within overall, and the U.S. bottom-line population total.
From these numbers, I calculated the percentage of people in each age/sex bin in years in which perfect games took place. I then multiplied this number by the attendance of each perfect game to find the number of people expected in each age/sex bin (as predicted by our simple model in which spectators at a baseball game is a representative sample of the U.S. population).
Then I subtracted the number of years between that perfect game and today, and added that number to each of the bins. That shows us how old that game’s crowd would be today. I added up the counts only the bins for people who are today 76 or younger (for men) or 81 or younger (for women).
So, calculate the percentage of people in each age/sex bin in each census. Multiply that percentage by the total number of people at each game to figure out how many members of each age/sex bin there were at the game. Increase the age of the population to today, and remove any bins that would be greater than 76/81 today. For example, Don Larsen’s perfect game took place 62 years ago, so I added up population counts for men who were then younger than (76 – 62 =) 14. That includes everyone from the age bins 0-4, 5-9, and 10-14. From a similar analysis of women, I added up everyone from age bins 0-4, 5-9, 10-14, 15-19. (In some cases the years overlapped bins – for example, Kenny Rogers’s perfect game in 1994 returns women 57 or younger, which overlaps the 55-59 age bin. So I added 2/5 of the count from that age bin.)
Adding up all remaining age bins leaves the number of people who attended each perfect game who are probably still alive in 2018. For example, of the 64,519 people who attended Don Larsen’s perfect game (it was in the World Series!), an estimated 21,422 are still alive today. We’ve made so many assumptions that I wouldn’t trust that number to be anywhere near exact, so let’s report it as “about 21,000”.
Doing this for all perfect games and then adding up for the total gives 363,000. But then, there’s Ed’s question: were there some people who saw multiple perfect games, meaning that we’ve double-counted some people? Do we need to adjust our estimate down to make up for it?
If this were 2011, I’d say no. Before then, perfect games had happened in different cities in different years. But then in 2012, Philip Humber’s and Felix Hernandez’s perfect games took place at Safeco Field in Seattle. Thus, a Mariners 2012 season ticket holder would have seen metaphorical lightning strike twice – two perfect games in one stadium in one season.
So to account for this effect, we need to estimate the percentage of the crowd at a Major League Baseball game that are season ticket holders who attend every game (or at least attended two games, and were incredibly lucky in choosing the two games they attended). I have no idea how to estimate that percentage. But consider that we know for sure two people that attended Hernandez’s perfect game but not Humber’s – me and my lovely spouse. And I know several Baltimore Orioles season ticket holders, none of them have attended every single game this season. So, let’s pick an estimate that I think is on the high side of reasonable: 10 percent.
That means that to account for the double-counting, we need to subtract about 10% of the crowd for Hernandez’s perfect game because we suspect they had already seen Humber’s perfect game four months before. Ten percent of 21,889 is (rounding off to remind ourselves this is just an estimate) about 2,000. So subtract out 2,000 people from our preliminary estimate of 363,000 to leave 361,000 people who have seen one or more perfect games.
Perfect Game number
Pitcher
Date
People who saw the game
People who saw the game who are alive today
1
Lee Richmond
6/12/1880
unknown
0
2
John Montgomery Ward
6/17/1880
unknown
0
3
Cy Young
5/5/1904
10,267
0
4
Addie Joss
10/2/1908
10,598
0
5
Charlie Robertson
4/30/1922
25,000
0
6
Don Larsen
10/8/1956
64,519
21,000
7
Jim Bunning
6/21/1964
32,026
14,000
8
Sandy Koufax
9/9/1965
29,139
13,000
9
Catfish Hunter
5/8/1968
6,298
3,000
10
Len Barker
3/15/1981
7,290
4,000
11
Mike Witt
9/30/1984
8,375
5,000
12
Tom Browning
9/16/1988
16,591
11,000
13
Dennis Martinez
7/28/1991
45,560
33,000
14
Kenny Rogers
7/28/1994
46,581
34,000
15
David Wells
5/17/1998
49,820
38,000
16
David Cone
7/18/1999
41,930
33,000
17
Randy Johnson
5/18/2004
23,381
19,000
18
Mark Buerhrle
7/23/2009
28,036
24,000
19
Dallas Braden
5/9/2010
12,288
11,000
20
Roy Halladay
5/29/2010
25,086
22,000
21
Philip Humber
4/21/2012
22,472
20,000
22
Matt Cain
6/13/2012
42,298
38,000
23
Felix Hernandez
8/15/2012
21,889
20,000
Â
(remove double-counts from Seattle 2012)
Â
-2,000
-2,000
Â
TOTAL
Â
567,444
361,000
Again, there are a lot of potential sources of systematic error in this analysis, so I don’t think we can be confident enough in our estimates to go down to the level of 1,000 people either way. So again, let’s round to 360,000.
Thus, our estimate shows that about 360,000 people alive today have seen a Major League Baseball perfect game. And I am one of them.
This has been a quick analysis of a small, self-contained question, but it showcases many features of the thought process that data scientists go through each day. I hope it’s been fun to follow along. The most important part is to always keep in the back of your mind: How might I be wrong?
With that in mind, then: how might this analysis be wrong? What assumptions did we make that we should not have made? What can we do to improve our estimates?
I’d love to hear your thoughts on this, no matter what your experience with math and science have been. The entire point of this blog is to bring the excitement of science to everyone.
Don’t make Eeyore sad, comment below with your thoughts!