When I received my undergraduate degree in chemistry, I never expected to use statistics to the degree I do today. My first career job was in a quality lab for a pharmaceutical manufacturer. I thought the full extent of my mathematics usage would consist of finding averages.
Every day I would go into the lab and add just enough of a chemical to a solution of a medication being manufactured to turn it bright pink. Then, seemingly to make sure I hadn’t accidentally flubbed up the first time, I’d repeat the process two more times. The three readings would then be averaged to determine exactly how strong the medication was. Perfect! Except, I didn’t realize how much background work had been done to make sure the vial of medication I tested was representative of the manufacturing lot from which it was pulled. The vial was the sample, and the manufacturing lot was the population. Behold, statistics at work!
In the 1990s and even early 2000s, many people like me didn’t start off careers thinking they would be statisticians, data scientists, or anything of the sort. With the digital revolution in full swing and predictive analytics at our doorsteps, it’s more critical than ever to understand how we define samples and populations. The way these items are defined changes everything about the conclusions that can be reliably drawn from them. And these conclusions can change how we understand and interact with the world.
A population is the collective group of individual items being studied or of interest. To understand a population (and samples and conclusions that can be made), one needs to define its scope clearly. The next step is to ask questions – and lots of them. My favorite questions include:
- Is the question clearly defined?
- Am I answering the correct question?
- What are the assumptions?
- Could some factor skew the data set or its collection?
- Are desired outcomes influencing how I analyze the data?
Let’s use a classic example. In World War II, the Allies and the Axis powers both relied on airplanes to provide firepower from the skies. It took time to develop the skills of the fighter pilots and bombers. Resources were constrained because planes could only be built so quickly. So how, with limited resources, could the Air Force protect the planes and flight crews they already had? Armor the planes. However, there was a concern. Too much weight would decrease agility and increases fuel use. Too little armor would mean the planes (and crew) would face greater risks.
To address this issue, the Air Force did what we still do today: flight crews collected data. After a mission, planes would return riddled with bullet holes, so crews recorded the number and location of hits. The data collected showed bullet holes were least likely to be found in the engines (about 1.1 hits per square foot) and most likely to be found on general areas of the plane or the fuselage where personnel were located (about 1.7-1.8 hits per square foot).
The pilots and crews must have felt some sense of relief seeing data that showed the fuselage had one of the higher hit rates. After all, they were trying to prove where to concentrate the armor.
As a result of this data, the Air Force assumed that the population of interest was the planes that made it back from their missions, likely because that group was immediately available and visible. They believed a sample of the returning planes could be used to capture the needed information about the location and number of hits.
Abraham Wald, a statistician working for the U.S., famously questioned not the data itself, but the assumptions around the data and the population it represented. Were bullets actually more likely to hit one part of a plane than another? Why was the rate of damage to the area around the engine so much lower?
His answer was to suggest the hit rate was likely consistent for all areas of a plane. The “missing” hits on the engines weren’t missing at all – the bullet holes could be found in the planes that never returned. Did that make it a different population? In reality, the population was all of the planes that left on a given mission. The data was only collected from a distinct portion of the population: the planes that were able to return.
This famous example of population and data skew is known as the survivorship bias. Data was collected only from the “surviving” members of the population. In doing so, they excluded the rest of the population that did not return to base.
Can there be a population of only those planes that returned from missions? Absolutely. But that wasn’t the question at hand. The question was how to protect all the planes that left on a mission, not just the ones that came back. The data collected had inadvertently been skewed because of the inability to directly collect information about the planes that did not return.
The crew went on to share their data and confirm how it should be interpreted. Collaboration with the statistical research group (SRG) allowed Wald to draw the right conclusions, which ultimately helped the crews.Wald’s recommendations for the placement of aircraft armor received an almost immediate response. Some of the responsiveness can be attributed to military hierarchal command structure. However, the crews themselves came to embrace his conclusions by seeing the broader perspective. They saw the safety potential of protecting the aircraft, which potentially might survive damage to land in friendly territory and preserve the lives of crewmates.
In How Not to Be Wrong: The Power of Mathematical Thinking, Jordan Ellenberg observed that the traditional winners of wars were “usually the guys who get 5% fewer planes shot down, or use 5% less fuel, or get 5% more nutrition into their infantry at 95% of the cost. That’s not the stuff war movies are made of, but it’s the stuff wars are made of.” Ellenberg concluded that the ability to question every assumption was the differentiator that made Abraham Wald great.1
Eighty years later, similar phenomena can still occur with data collection, populations, and conclusions. Through a quick internet search, you can find myriad examples of conclusions being drawn for a whole population, despite the fact that the sample isn’t always reflective of that population.
Read more: The Next Industrial Revolution
Whether you are a consumer or producer of data, take time to understand the population that a conclusion is based on. Relentlessly question how data is collected, what it represents, whether it represents the whole population or just a portion of it, and if it represents what you really want to know.
When I was in college, I was “adopted” by a local grandmother, Mrs. Irene Carey Penman. She would invite me over to do laundry and enjoy a home cooked meal. Eventually, Mrs. Penman asked me how common my last name “Fory” was. She had noticed that a Captain George P. Fory had also flown in WWII in the same bombing group as her husband Captain Richard A. Carey. I knew Captain Fory as my great uncle Philip. While Captain Carey had been shot down in July 1943 and spent time as a POW before returning home, Captain Fory was lucky enough to return to the airbase from his missions in 1944. Did the September 1943 publication of Abraham Wald’s conclusions save my great uncle from time in a POW camp? I will never know, but the lingering question still drives me to fiercely and incessantly question my own assumptions.
More information about Captains Carey and Fory and their missions is available here.
1 Ellenberg, J. How Not to Be Wrong: The Power of Mathematical Thinking. New York: Penguin Books, 2015.
Jo Ellen Fory Scott is chair of the subcommittee on statistical quality control (E11.30). With an extensive background in pharmaceutical manufacturing and the energy sector, Scott focuses on quality and safety management systems as a Technical Lead at ENTRUST Solutions Group.