Statistical Process Control Charts and Managing Multiplicity
Q. Why am I getting so many out-of-control signals?
A. There are multiple possibilities that may provide a larger number of out-of-control (OOC) signals than can be reasonably managed:
- Case 1 - The process is OOC in a manner that systematically impacts multiple product properties at once (multiplicity by correlation).
- Case 2 - High product manufacturing and/or data frequency implies many OOC signals (test and/or data generation multiplicity). See Figure 1.
- Case 3 - There are multiple control chart parameters that collectively have increased the risk that one or more parameters of the in-control process may be assessed to be OOC (parameter multiplicity).
- Case 4 - The control charting statistical methodology assumes normality when the in-control data are nonnormal (inappropriate statistics).
Users of statistical techniques assume that statisticians have already done all their homework for them. After all, they’ve provided well-defined rules for constructing control charts of various types and well-defined rules for their interpretation. What could go wrong? But statisticians have not done your risk management homework. They typically warn you about distributional assumptions which, when appropriately violated, can provide Case 4 for excessive alarming. What they don’t typically warn about are Cases 1, 2, and 3, the scenarios where the control charting procedure is being applied frequently (multiplicity) in some testing context. And monitoring is, in fact, repeated testing.
This is partly due to statisticians often viewing themselves only as risk managers of the provided algorithm – rather than risk managers of how these algorithms are broadly applied in practice. Many other statistical procedures have risk management issues in particular applications, even sometimes when used by a statistician, but this article focuses on control charting.
Case 1 occurs when some or many of the control charted parameters are correlated due to the inherent nature of the manufacturing, sampling or measurement process. They are likely to have all arisen from the same root cause or same measurement artifact. Correlation is sometimes an artifact of the type of measurement system or measurement sampling system when multiple charted physical measurements are determined simultaneously by a measurement system.
Cases 2 and 3 will be the primary focus of this discussion. In Case 2, imagine a single measurement for monitoring system performance that is automatically measured every 4 minutes. Further assume that a Shewhart Individuals Control Chart, with the standard 3-sigma limits, is being used to monitor the process. In a day there will be 15x24 = 360 measurements, each with a probability of approximately 1/370 of giving an OOC signal. This frequency will provide, by chance alone, an average of nearly one OOC alarm signal per day (360/370) when the system is in control.
Figure 1 presents an example of one day’s worth of in-control data from our hypothetical system simulated and charted. Note that 3 OOC signals are observed. There is about a 7.5% chance of 3 or more OOC alarm signals being observed for the described in-control scenario. There is only about a 38% chance that no OOC signals would have been observed in a day. If any version of the Western Electric ruleset were being used in conjunction with the Shewhart control chart, there would be a substantial increase in multiple alarm signals daily by chance alone.
Figure 1: Alarm Impact of In-Control High Frequency Data
In Case 3, imagine you are measuring and control charting 40 parameters and producing 9 batches daily (9x40 = 360 measurements daily). This is roughly analogous in terms of risk to the prior Case 2 scenario as long as the 40 parameters are not highly intercorrelated.
A short article of this type can’t fully describe the best ways to fix this systemic risk management issue (Cases 2 and 3). It can be useful to think of the problem as a process improvement or process sustenance resource allocation issue. If there are a large number of signals that do not seem to related to an assignable cause, pick a sigma multiplier larger than 3 for declaring the process OOC. Select the multiplier to give a manageable rate of false OOC signals but not so large as to create an unacceptable risk of process failure. This will reduce the rate of OOC signaling and allow prioritization of larger process deviations for process improvement.
If no process improvement is achieved from these efforts, and there is no process degradation, one might consider the possible use of an even larger sigma multiplier for the alarm level. However, it should never be set so large as to put the alarm level near a known mode of failure. Keep in mind that failures can occur both in your and in your customer’s production process. So the assessment of failure modes and of risk must also be extended to your customer’s processes. If process improvement is being achieved over time, consider the value of reducing the multiplier closer to 3 than the expanded multiplier described earlier.
The prior paragraphs describe a pragmatic approach to achieving what one is hoping to gain from control charting combined with the application of root-cause analysis. It is intellectually controversial in that it deviates from current common practice. However, the current common practice naively fails in the face of multiplicity.
Lessening excess alarming impact due to multiplicity in the case of multiple parameters can also be approached through dimensionality reduction algorithms. The resulting charts are multiparameter. Dimensionality reduction can potentially mask changes in individual variables that still require some level of monitoring. A hybrid limit approach is suggested whenever data dimensionality is reduced by any methodology. In a hybrid approach, charts are also maintained on individual variables but with larger than 3-sigma limits.
Excessive alarms from control charting application should never be ignored, rather they should be addressed appropriately. If multiplicity is present strongly enough, the situation needs to be addressed such that resources are not wasted chasing statistical phantoms. Managing a modern data-rich quality system is still sometimes erroneously perceived to still be formulaic in nature. Multiplicity is never to be ignored. ■
Thomas Bzik, StatsOnTheGo, Inc., is a member and former chair of the committee on quality and statistics (E11). For his service, he received Awards of Appreciation from E11 and the committee on air quality (D22).
John Carson, Ph.D., is senior statistician for Neptune and Co. and coordinator of Data Points. He is a member of the committees on quality and statistics (E11), petroleum products, liquid fuels, and lubricants (D02), air quality (D22), and more.