A Primer on the Poisson Distribution and Applications in Quality Control and Risk Analysis

In the latest Data Points column, Stephen N. Luko discusses one of the most important distributions in all of statistics.

The Poisson distribution is one of the most important distributions in all of statistics. It is a discrete distribution and counts certain types of events in time or over some other observational region such as area, volume, or distance. The distribution was named for the French mathematician and physicist Siméon Denis Poisson (1781-1840), who described it in an early work concerning research on the probability of criminal and civil verdicts.1 To derive the distribution, Poisson used a method based on the limiting form of a binomial distribution when p approaches 0. By and large, this is the method that most textbooks use today.2 Other researchers, following Poisson, have shown that it applies to diverse phenomena. Some of this history has been recently summarized by Hanley and Bhatnagar.3

Characterization

The Poisson is counting random events on a fixed observational interval such as time, area, volume, length, and more. Any number of events can occur within the interval (0≤X<∞). This case is distinct from a binomial distribution, where there are a fixed number, n, of trials, and we count both successes and failures.4 The Poisson only counts event occurrences. For example, we can count missing parts in a complex assembly unit in a manufacturing process, the occurrence of pits in a metal surface area of one square yard of a material, or accidents in a busy intersection during rush hour. In each case, if an event does not occur, it cannot be counted. A most important point about the observational interval that is that it must be homogeneous, which means that event occurrences in any two non-overlapping sub-regions of the interval are independent. Further, the probability of any number of events in any two non-overlapping sub-intervals of the same size within the observation region should be identical. Thus, if the interval we observe is the time between 7:00 and 8:00 a.m., the probability of an event between 7:00 and 7:05 is identical to an event between 7:45 and 7:50. This assumption should be thought through carefully in any application, as non-homogeneity will alter the probabilistic behavior. The one item that completely determines the probabilistic behavior on the observational interval is the mean or expected number of events within the interval. The homogeneity description further implies that the mean is proportional to the length/size of the interval as long as homogeneous conditions continue to prevail. It is noted that the mean need not be an integer, so, for example, the mean might be 3.25 or 0.367 events.

For a fixed observational interval, denote the mean as µ>0. The probability mass function for the Poisson distribution, and governing events within the interval, is given as:

An alternate characterization is to use a rate constant, λ, and an auxiliary parameter t. The mean on the interval of size t becomes µ=λt. The rate parameter λ carries the units of events per unit time (or area, volume, length, etc.).

The units of λ and t must be the same. For example, if λ were equal to 0.036 events per square inch, then in a homogeneous region of t=144 square inches, the mean would be calculated as µ=λt=0.036(144)=5.184¬ events. Suppose we wanted instead to study a region of 40 square inches. The mean would change proportionately to (40/144)(5.184)=1.44 events on the new 40 square inch interval. The form of the Poisson distribution that uses λ and t is often found in risk analysis applications.

Applications

When control charts were invented in the early 1930s, researchers quickly found that attribute-type data as well as variable data might be used with control charts. The p-chart, which relies on the binomial distribution, is well known. What is probably less well-known and likely underutilized, is the c-chart. This plots Poisson-type events in a process. There are numerous types of such random type events in manufacturing, business processes, and elsewhere. A few examples include surface-area blemishes, pits or scratches in metal stock, missing components in a complex assembly, equipment breakdowns in an eight-hour shift, errors on drawings or in text, calls to a help desk for PC service, quarterly lost-time accidents in a large facility, website hits in a day, jobs that have to be redone, attrition in a large company, and many others. To create a c-chart for a constant observation region or sample size, one only needs the mean number of events in an initial sample. For the Poisson, sigma is calculated as the square root of the mean. Thus, if the mean number of material blemishes in a group of product units is 4, sigma is 2, making an upper “3-sigma” control limit of 4+3(2)=10.    

The Poisson distribution is often used to model “rare” events such as those found in risk analysis. Such events are typically found in time intervals, and in these cases, there is a relationship between the number of events and the time between events. The distribution of time between events when observing a homogeneous Poisson process is referred to as the exponential distribution. The Poisson counts events in a time interval, and the exponential measures the time between events. The exponential is completely determined by a single parameter, θ, the mean time between events. Again, a rate constant, λ=1/θ is often used in its characterization. 

Using the rate parametrization, the density function, f(t), and cumulative distribution function, F(t), for the exponential are:

 

The function R(t)=1-F(t)=exp(-λt) is also extensively used in practice, and is called the reliability at time t.  Reliability may be taken to mean survival probability at time t. In (1), if we use µ=λt as the Poisson parametrization and x=0, meaning 0 events on the interval t, then we see that F(t), above, is its complement – that is, the probability that at least one event will occur in time t. In other words, the probability that an event will take a time t to occur is equal to the Poisson probability of at least one event in time t. This establishes the link between the Poisson and exponential distributions. The exponential distribution has the curious “memoryless” property in that, given that an event as not happened by a time t, the probability an event will occur in an additional time s is identical with the probability that an initial event would occur in time s. Thus, the process is unaffected by the fact of the elapse of time t without an event. This is truly at the heart of how random events behave. 

In product quality, if a defect or failure mode is a random type of event, then given the product’s use for some duration without failure, the product is considered as good as a new item with respect to the said random-failure mode. In the language of reliability this further means, where random failure modes are concerned, R(s+t)=R(s)R(t). Further information on the exponential side can be found in a past Data Points column.5 

References

1 Stigler, S. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Harvard University Press, 1986: 182.

2 E.g. Hogg, R. V. and Tanis, E. A. Probability and Statistical Inference. 7th edition. Saddle River, NJ: Prentice Hall, 2006.

3 Hanley, J. A. and Bhatnagar, S. “The ‘Poisson’ Distribution: History, Reenactments, Adaptations.” The American Statistician, Vol. 76, No. 4 (2022): 363-371.

4 Brown, J. and Dalton, C. “Quantifying Probability of Detection (POD) Using the Binomial Distribution.” Standardization News (Jan/Feb 2023): 50-51.

5 Luko, S. N. “What is Reliability: Key Concepts and Terminology.” Standardization News (Jan/Feb 2018): 28-29.

Stephen N. Luko, former fellow, Collins Aerospace Corporation, Windsor Locks, Connecticut, is a past chair of the committee on quality and statistics (E11), the current chair of the subcommittee on reliability (E11.40), and a fellow of ASTM International.

John Carson, Ph.D., of P&J Carson Consulting LLC, Findlay, Ohio, is the Data Points column coordinator. He is chair of the subcommittee on statistical quality control (E11.30), part of the committee on quality and statistics (E11), and a member of the committee on environmental assessment, risk management, and corrective action (E50).

Industry Sectors

Committees