The key to Walter Shewhart's choice of three-sigma limits lies in the title
of his first book, Economic Control of Quality of Manufactured Product,
where he emphasizes the economics of decisions. For example, Shewhart writes:
"As indicated the method of attack is to establish limits of variability
such that, when [a value] is found outside these limits, looking for an
assignable cause is worthwhile."
Here Shewhart makes a fundamental distinction-some processes are predictable
while others are not. He shows that by examining the data produced by a
process, we can determine the predictability of a process. If the data show
that a process has been predictable in the past, it's reasonable to expect
that it will remain predictable in the future. When a process is predictable,
it's said to display common-cause, or chance-cause variation. When a process
is unpredictable, it's said to display assignable-cause variation. Therefore,
the ability to distinguish between a predictable process and an unpredictable
one depends upon your ability to distinguish between common-cause and assignable-cause
variation.
What's the difference? Shewhart writes that a predictable process can be
thought of as the outcome of "a large number of chance causes in which
no cause produces a predominating effect." When a cause does produce
a predominating effect, it becomes an "assignable" cause. Thus,
if we denote the predominating effect of any assignable cause as a signal,
then the collective effects of the many common causes can be likened to
background noise, and the job of separating the two types of variations
is similar to separating signals from noise.
In separating signals from noise, you can make two mistakes. The first mistake
occurs when you interpret noise as a signal (i.e., attribute common-cause
variation to an assignable cause). The second mistake occurs when you miss
a signal (i.e., when we attribute assignable-cause variation to common causes).
Both mistakes are costly. The trick is to avoid the losses caused by these
mistakes. You can avoid making the first mistake if you consider variation
to be noise. But, in doing this, your losses from the second mistake will
increase. In a similar manner, you can avoid making the second mistake if
you consider each value a signal indicator. But, in doing this, your losses
from the first mistake will increase.
In our world, when using historical data, it's impossible to avoid both
mistakes completely. So, given that both mistakes will be made occasionally,
what can we do? Shewhart realized it's possible to regulate the frequencies
of both mistakes to minimize economic loss. Subsequently, he developed a
control chart with three-sigma limits. Three-sigma limits filter out nearly
all probable noise (the common-cause variation) and isolate the potential
signals (the assignable-cause variation).
How is it possible that three-sigma limits filter out virtually all probable
noise? While there are certain mathematical inequalities that guarantee
most data sets will require at least 95 percent of the values within three
standard deviations of the average, a better rule of practice is the Empirical
Rule, which states that about 99 percent to 100 percent of the data will
be located within three standard deviations, either above or below the average.
Figure 1 displays six theoretical distributions to illustrate the Empirical
Rule's appropriateness. It shows the area within three standard deviations
of the mean. No matter how skewed or "heavy tailed" the distribution
may be, virtually all of the area under the distribution curve will fall
within three standard deviation units of the mean. When applied to homogeneous
data sets, the Empirical Rule suggests that no matter how the data "behave,"
virtually all of the data will fall within three standard deviation units
of the average. Because data that display statistical control are, by definition,
reasonably homogeneous, the Empirical Rule explains why the control chart
will yield very few instances of noise interpreted as a signal.
Figure 1 also shows that three-sigma limits will indeed filter out nearly
all common-cause variation displayed by predictable processes.
Three-sigma limits allow you to detect the process changes that are large
enough to be economically important, while filtering out almost all common-cause
variation. These limits allow you to strike a balance between the losses
associated with interpreting noise as a signal and attributing assignable-cause
variation to common causes.
About the author
Donald J. Wheeler is an internationally known consulting statistician
and the author of Understanding Variation: The Key to Managing Chaos, Advanced
Topics in Statistical Process Control and Understanding Statistical Process
Control, Second Edition. © 1996 SPC Press Inc. Telephone (423) 584-5005.