The Meaning of Statistical Confidence
Q. How confident can I be that (fill in the blank)?
A. With enough research, a good design, and a controlled process, I can be pretty confident. But this is not a statistics question. The term statistical confidence typically appears with the results of a statistical analysis aimed at inferring something about a population or process based on a single sample of data (commonly referred to as statistical inference), where the sample is assumed to be a random representation of a population or stable process. Statistical intervals and hypotheses tests are common and familiar methods used for statistical inference, each having a statement about statistical confidence associated with the results. However, the meaning of statistical confidence is often misunderstood by practitioners.
The Merriam-Webster online dictionary defines confidence as “faith or belief that one will act in a right, proper, or effective way.” When statistical methods are used to draw conclusions based on sample data, our confidence lies in the statistical methods acting “in a right, proper, or effective way” such that the conclusions drawn are correct. Statistical confidence is typically represented as a percent with notation , where represents a long-run probability statement about the statistical method itself. Further, confidence refers to a long-run result. That is to say, confidence refers to how likely it is to draw the correct conclusion about a population or process if the method used were repeated many times under identical conditions, using different sample data sets. Common values for α are 0.01, 0.05, and 0.10, resulting in levels of confidence equal to 99%, 95%, and 90%, respectively. For example, 95% confidence means that the probability that the statistical method has correctly captured the truth is 95%. The assumed risk of drawing the wrong conclusion is represented by α. Given a 95% level of confidence, there is a 5% risk that the statistical method did not capture the truth and the wrong conclusion is drawn.
In the last installment of Data Points, we showed how statistical intervals can be used to account for uncertainty in a point estimate based on a single sample of data and with a given level of statistical confidence.1 To illustrate the meaning of statistical confidence, consider the statistical interval example of estimating the true proportion of orange M&Ms in an individual bag. When x observations of interest occur in a sample of size n, the population proportion can be estimated as .2 Given a single individual bag of M&Ms, the proportion of orange M&Ms can be estimated by dividing the number of orange M&Ms by the total number in the bag. But what if data were collected from another bag (i.e., another sample)? Would the number of orange M&Ms be the same as in the first bag? Would the total number in the bag be the same? Though not typically done in practice, if we were to sample multiple times, under the same conditions, the resulting sample proportions would be different, since the data collected from each sample (i.e., each bag) would be different. This illustrates variation due to sampling. In addition, there may be other sources of uncertainty at play, such as inherent variation in the M&M manufacturing process. To account for this variation, a confidence interval for a population proportion, p, can be used and is calculated as:
, where is the upper 100%·(1-α/2) percentage point from the standard normal distribution, such that and assuming and are both at least 5.3
As an experiment, actual data on an individual bag of M&Ms was independently collected by 21 experimenters. Each experimenter was given a single bag of M&Ms and told to estimate the proportion of orange M&Ms based on what was in their bag (i.e., their independent, random sample from the population of M&Ms). Table 1 shows the data collected by each experimenter along with the calculated proportion of orange M&Ms and 95% confidence interval ( in this case).
Consider the experimenter who had bag 1. Based on the results, the experimenter estimates that the true proportion of orange M&Ms in a bag is between 3.9% and 21.5% with 95% confidence. In the context of statistical intervals, 95% confidence refers to the probability that the interval has correctly captured the truth, meaning there is a 5% risk that the statistical interval did not capture the truth and the wrong conclusion is drawn.
Figure 1 plots the confidence interval calculated by each experimenter along with the true proportion of M&Ms, which was quoted as 20% at the time the data in Table 1 was collected.
Note that the statistical interval calculated by the experimenter who had bag 1 includes 20%. The statistical interval method used in this case correctly captured the truth, and a correct conclusion is drawn – that is, the true proportion of orange is in fact between 3.9% and 21.5%. For the experimenter who had bag 6, the statistical interval method did not capture the truth and the and the conclusion drawn is an underestimate of the true proportion. But the chance of the wrong conclusion happening is approximately 5%, as illustrated by the fact that 1 out of the 21 statistical intervals, or approximately 4.76%, failed to capture the true proportion of orange. This illustrates that in a single experiment, you are 95% confident that the correct conclusions about the population or process will be drawn based on the statistical inference made from a single sample of data.
In summary, statistical confidence is a probability statement about the long-run effectiveness of the statistical method in drawing the correct conclusion. So how confident can we be that we get 20% orange M&Ms in our bag? Ask the Mars Wrigley Confectionery division of Mars, Incorporated – not
a statistician. ■
References
1. Luko, S. and Brown, J. “Revisiting Statistical Intervals.” Standardization News (Jan./Feb. 2024): 48-50.
2. Luko, S. and Neubauer, D.V. “Statistical Intervals Part 1: The Confidence Interval.” Standardization News (July/Aug. 2011): 18-20.
3. Ibid.
Author Information
Stephen Luko is a retired statistician with 40 years of industrial experience, holding the title of Fellow at Collins Aerospace, ASTM International, and the American Society for Quality. He is the current chair of the subcommittee on reliability (E11.40) and past chair of the committee on quality and statistics (E11).
Jennifer Brown is a statistician with 16 years of experience in the aerospace industry. She is chair of the subcommittee on terminology (E11.70) and a member of the subcommittee on specialized NDT methods (E07.10).
John Carson, Ph.D., is senior statistician for Neptune and Co. and coordinator of Data Points. He is a member of the committees on quality and statistics (E11), petroelum products, liquid fuels, and lubricants (D02), air quality (D22), and more.