Good statistics are like a telescope for an astronomer, a microscope for a bacteriologist, or an X-ray for a radiologist. If we are willing to let them, good statistics help us to see things about the world around us, and about ourselves – both large and small – that we would not be able to see in any other way.
The Data Detective: Ten Easy Rules to Make Sense of Statistics, by Tim Harford, is a remarkable book about using statistics to see and understand the world. He does not attempt to teach the reader statistics, but steps back and takes a broader view about how statistics help us understand our world.
Harford, an economist and prolific communicator, is a gifted storyteller, effectively weaving tales together to teach us lessons that broaden our understanding. In the BBC 4 radio program More or Less,1 he “explains - and sometimes debunks - the numbers and statistics used in political debate, the news and everyday life.” In his popular podcast, Cautionary Tales, Harford tells “true stories about mistakes and what we should learn from them.”2 And as a prolific author, he has written popular books such as The Undercover Economist, Adapt: Why Success Always Starts with Failure, and Fifty Things That Made the Modern Economy.
Harford divides the book into ten lessons, which fall into two broad categories: 1) understanding ourselves and 2) understanding how the statistics world operates.
The focus on understanding ourselves may seem odd at first because “detectives” are usually investigating “others.” Harford makes the case that self-awareness is critical to our understanding, because our own biases, experiences, and character traits influence how we see the world. Chapters highlight rules like “Search Your Feelings,” “Ponder Your Personal Experience,” “Avoid Premature Enumeration,” “Step Back and Enjoy the View, and “Keep an Open Mind.”
The chapter “Ponder Your Personal Experience,” showcases Harford’s thoughtful and measured approach.
Our personal experiences should not be dismissed along with our feelings, at least not without further thought. Sometimes the statistics give us a vastly better way to understand the world; sometimes they mislead us. We need to be wise enough to figure out when the statistics are in conflict with everyday experience – and in those cases, which to believe.
He cites the example of London tube ridership statistics, where he experiences tightly packed trains which he struggles to board, yet he confronts statistics that say the average ridership is significantly lower. Who is right? He answers this question by exploring the origin of the statistics and understanding what is measured. He shows that it is possible that most trains are not crowded, but that most people travel on crowded trains.
Harford doesn’t stop there but changes his thinking and perspective to ask the question of whether the average number of people on a train is a good measure of overcrowding. He wonders if it might be better to think about this problem in terms of the average passenger experience rather than the average train car. Viewed from the perspective of the average passenger, trains are crowded. This leads him to conclude that “there’s no single objective measure of how busy the public transport network is.”
Any measure reflects a specific perspective, whether it is the experience of someone riding the train or an individual managing the network. By looking deeper into the data and measures, it is possible that our personal experience and statistics tell us something different, yet both are true.
Harford’s chapters and rules relating to statistics in the world around us include: “Get the Backstory,” “Ask Who Is Missing,” “Demand Transparency When the Computer Says No,” “Don’t Take Statistical Bedrock for Granted,” and “Remember that Misinformation Can Be Beautiful, Too.”
In “Get the Backstory,” Harford delves into the world of research, revealing several important insights. He believes that you need to understand the complete situation, including studies with positive, inconclusive, or negative results.
Interesting findings are published; non-findings, or failures to replicate previous findings, face a high publication hurdle.
Confirming that research results are in fact true, requires that research needs to be replicated. The replication of existing research, to confirm results, is a relatively recent phenomenon. Harford cites the relatively shocking 2005 paper titled “Why Most Published Research Findings are False.” More recent research, including that by the University of Virginia’s Brian Nosek, showed that over half of high-profile psychological studies could not be replicated.
There are several possible explanations for the low replication rate that might include sample bias, poor research design, hacking the research design until something interesting turns up, or publishers’ interest in only printing novel results.
When the stakes are high, the problem of reproducible research is a critical issue. Harford does not despair but cites work within the medical community. He discusses the recent trend to pre-register clinical trials prior to publication. This means that researchers openly declare their research plans, including the proposed analysis of data, prior to publishing any results. Journals can refuse to publish research that is not pre-registered. This helps avoid the issue of “fiddling” the results, but is not a perfect solution and is not uniformly enforced.