I don’t read Greg Easterbrook, for roughly the same reason I don’t read anything else in the sports pages. When I want to get the experience of bulky men straining themselves trying to exceed their innate abilities, I watch C‑SPAN.
I was reminded of why I don’t read Easterbrook by a comment that Brad Delong quotes from a post by Matt Ygliesias, in which Matt quotes Easterbrook saying that the Lancet’s study of excess mortality in Iraq is “silly” and that it:
absurdly estimates that since March 2003 exactly 654,965 Iraqis have died as a consequence of American action.…It’s gibberish.
Does the Lancet actually assert that “exactly 654,965 Iraqis have died as a consequence of American action”? Of course not. Is it gibberish? Judge for yourself. As SciBling Tim Lambert points out, the authors wrote:
We estimate that as of July, 2006, there have been 654,965 (392,979–942,636) excess Iraqi deaths as a consequence of the war, which corresponds to 2.5% of the population in the study area.
Understanding why that is different from what Easterbrook wrote turns out to be important in much of our lives. How we understand uncertainty in measured numbers matters, and matters more and more as time goes by. Most importantly, when you understand the basics of statistics, you can spot lies with statistics pretty easily.
I’ve TAed stats classes and worked with other students to try to help them understand how stats works. The key problem in understanding statistics is not the math, it’s the mindset.
In statistics, you are always dealing with two numbers. The number people tend to focus on is what we call a measure of location. A batting average, the estimate of 654,965 excess deaths above, the 40% of poll respondents saying they’ll vote for Jim Ryun – these are all measures of location, they place you somewhere in numerical space.
The example I’ll work with here is the geographical center of the United States, a point a little west of where I am now. That is the point on which a map of the country could be balanced. In some sense, it describes where the United States is, and you could use its latitude and longitude to compare the US with other countries.
No one, of course, would claim that the United States only exists at its geographical centroid, the average position of the nation. That’s where the second number comes from, it is a measure of variability (statisticians usually call it a measure of spread or of dispersion). In fact, it’s hard to think of an example of a measure of variability that people bandy about the way they do with averages.
With our map example, we might choose to indicate the size of the nation by talking about the maximum (or minimum) distance from the edge of the country to the center described above. We might compare a baseball player’s batting averages in several games to assess whether he is streaky or consistent in his batting. Margins of error indicate the error associated with a poll’s estimate of public opinion.
What Easterbrook did wrong was look only at the first number, latch onto false precision in the estimate of excess mortality, and ignore the measure of dispersion.
The range the authors quoted is what’s called a 95% confidence interval. What it means is that 95% of the time, if the excess mortality were truly within that range, you’d get data no more extreme than what the researchers actually got. More importantly than the measure of location (the 655,000 figure) is that the variability does not include 0. What that means is that if there were actually no additional mortality since the invasion and occupation, it would be practically impossible to get the data the researchers actually observed.
That’s a conclusion that follows from statistical hypothesis testing. We can reject the hypothesis of zero mortality. We cannot reject a hypothesis that an excess 950,000 people were killed, nor that an excess 400,000 have died. We know, based on our understanding of the variability of the data, that the number of excess deaths since the invasion is somewhere in between, and that it is most likely that the number is right around 655,000. The variability matters here.