Understanding Populations

Standardization News

Understanding Populations

It's more critical than ever to understand how we define samples and populations.

BY:

Jo Ellen Fory Scott

When I received my undergraduate degree in chemistry, I never expected to use statistics to the degree I do today. My first career job was in a quality lab for a pharmaceutical manufacturer. I thought the full extent of my mathematics usage would consist of finding averages.

Every day I would go into the lab and add just enough of a chemical to a solution of a medication being manufactured to turn it bright pink. Then, seemingly to make sure I hadn’t accidentally flubbed up the first time, I’d repeat the process two more times. The three readings would then be averaged to determine exactly how strong the medication was. Perfect! Except, I didn’t realize how much background work had been done to make sure the vial of medication I tested was representative of the manufacturing lot from which it was pulled. The vial was the sample, and the manufacturing lot was the population. Behold, statistics at work!

Read more: Data Crunch - Big Data and Additive Manufacturing

In the 1990s and even early 2000s, many people like me didn’t start off careers thinking they would be statisticians, data scientists, or anything of the sort. With the digital revolution in full swing and predictive analytics at our doorsteps, it’s more critical than ever to understand how we define samples and populations. The way these items are defined changes everything about the conclusions that can be reliably drawn from them. And these conclusions can change how we understand and interact with the world.

A population is the collective group of individual items being studied or of interest. To understand a population (and samples and conclusions that can be made), one needs to define its scope clearly. The next step is to ask questions – and lots of them. My favorite questions include:

Is the question clearly defined?
Am I answering the correct question?
What are the assumptions?
Could some factor skew the data set or its collection?
Are desired outcomes influencing how I analyze the data?

Let’s use a classic example. In World War II, the Allies and the Axis powers both relied on airplanes to provide firepower from the skies. It took time to develop the skills of the fighter pilots and bombers. Resources were constrained because planes could only be built so quickly. So how, with limited resources, could the Air Force protect the planes and flight crews they already had? Armor the planes. However, there was a concern. Too much weight would decrease agility and increases fuel use. Too little armor would mean the planes (and crew) would face greater risks.

To address this issue, the Air Force did what we still do today: flight crews collected data. After a mission, planes would return riddled with bullet holes, so crews recorded the number and location of hits. The data collected showed bullet holes were least likely to be found in the engines (about 1.1 hits per square foot) and most likely to be found on general areas of the plane or the fuselage where personnel were located (about 1.7-1.8 hits per square foot).

Read more: How New Technologies are Helping to Clean Up Coal

Can there be a population of only those planes that returned from missions? Absolutely. But that wasn’t the question at hand. The question was how to protect all the planes that left on a mission, not just the ones that came back. The data collected had inadvertently been skewed because of the inability to directly collect information about the planes that did not return.

The crew went on to share their data and confirm how it should be interpreted. Collaboration with the statistical research group (SRG) allowed Wald to draw the right conclusions, which ultimately helped the crews.Wald’s recommendations for the placement of aircraft armor received an almost immediate response. Some of the responsiveness can be attributed to military hierarchal command structure. However, the crews themselves came to embrace his conclusions by seeing the broader perspective. They saw the safety potential of protecting the aircraft, which potentially might survive damage to land in friendly territory and preserve the lives of crewmates.

In How Not to Be Wrong: The Power of Mathematical Thinking, Jordan Ellenberg observed that the traditional winners of wars were “usually the guys who get 5% fewer planes shot down, or use 5% less fuel, or get 5% more nutrition into their infantry at 95% of the cost. That’s not the stuff war movies are made of, but it’s the stuff wars are made of.” Ellenberg concluded that the ability to question every assumption was the differentiator that made Abraham Wald great.¹

Eighty years later, similar phenomena can still occur with data collection, populations, and conclusions. Through a quick internet search, you can find myriad examples of conclusions being drawn for a whole population, despite the fact that the sample isn’t always reflective of that population.

References

¹Ellenberg, J. How Not to Be Wrong: The Power of Mathematical Thinking. New York: Penguin Books, 2015.

Jo Ellen Fory Scott is chair of the subcommittee on statistical quality control (E11.30). With an extensive background in pharmaceutical manufacturing and the energy sector, Scott focuses on quality and safety management systems as a Technical Lead at ENTRUST Solutions Group.

Industry Sectors

Quality

Issue Month

November/December

Issue Year

2022

Committees

E11

Understanding Populations

Standardization News

Understanding Populations

Read more: Data Crunch - Big Data and Additive Manufacturing

Read more: New ASTM Committee Focuses on the Digital Supply Chain

Read more: How New Technologies are Helping to Clean Up Coal

Read more: The Next Industrial Revolution

References