spctoolkit
by Donald J. Wheeler
The problem is not in knowing how to manipulate numbers
but rather in not knowing how to interpret them.
Analyzing Data
From the beginning of our education, we have all learned that "two
plus two is equal to four." The very definiteness of this phrase summarizes
the unequivocal nature of arithmetic. This phrase is used to characterize
that which is inevitable, solid and beyond argument. It is the first item
in our educational catechism, which is beyond dispute.
This bit of arithmetic has been elevated to a cliché for the following
reasons. During the years when we were learning our sums and our multiplication
tables, we were also learning to spell and to write. This means that we
had to learn about irregular spellings. We had to learn to use irregular
verbs. And we had to learn to cope with many of the idiosyncrasies of language.
In contrast to this, we learned that there are no irregular spellings in
arithmetic. Whether you multiply three by two or multiply two by three,
the result is always six. Addition, subtraction, multiplication and division
contain no irony; they contain no hyperbole. The multiplication tables contain
no sarcasm.
As a result, we receive a subliminal message: Numbers are concrete, regular
and precise, but words are inconstant, vague and changing. The contrast
between the regularity (and for some, the sterility) of mathematics and
the complexity (and richness) of language leaves us all with an inherent
belief that numbers possess some native objectivity that words do not possess.
Hence, when we want to indicate a solid and dependable truth, we are prone
to recall the first rule in the mathematical catechism: Two plus two is
equal to four.
Because of this subliminal belief, we feel that we have some sort of control
over those things we can measure. If we can express it in numbers, then
we have made it objective, and we therefore know that with which we are
dealing. Moreover, due to all the uncertainty we routinely must deal with,
this ability to quantify things is so reassuring, so comforting, that we
gladly embrace measurements as being solid, real and easy to understand.
Hence, today we have gone beyond measuring the physical world. We have gone
beyond the accounting of wealth. Now we are trying to measure everything.
If we can quantify it, then we can deal with it "scientifically."
So now we "measure" attitudes, we measure satisfaction, and we
measure performance. And once we have measured these things, we feel that
we know them objectively, definitively and concretely.
But, having obtained these measurements, how do you analyze them? Do the
normal rules of arithmetic apply?
Unfortunately, all of our mathematical education has not prepared us to
properly analyze such measurements. Our very first lessons taught us that
two numbers which are not the same are different. So when the numbers differ,
we conclude that the things being measured are also different. That this
is not so is a fact that seems to have escaped the attention of almost everyone.
And when we think the things are different, we tend to rank them and publish
a list. For example, a recent article in my local newspaper reported that
Nashville and Knoxville were, respectively, the 25th and 27th "most
violent cities in the country." This ranking was based on the number
of crimes against persons reported to the FBI by the local law enforcement
agencies. But just what is entailed in such numbers? Is purse snatching
a burglary (a crime against property) or a robbery (a crime against a person)?
Is domestic violence reported as an assault or as disturbing the peace?
These and other crimes are reported differently in different cities.
Finally, even if the crimes were categorized and reported the same way,
would the crime rates make the proper comparison? The incorporated portion
of Nashville includes all of Davidson County and consists of urban, suburban
and rural areas. In contrast, only half the population of greater Knoxville
lives within the city limits-the rest live in the unincorporated portions
of Knox County. Therefore Knoxville contains a much higher proportion of
urban environments than does Nashville. If crime rates are higher in an
urban setting, then dividing the number of reported crimes by the city's
population will artificially inflate Knoxville's rate compared to that of
Nashville.
Considerations such as these can raise more than a reasonable doubt about
the appropriateness of most of the published rankings we hear about every
day. Many comparisons made by those who compile lists are virtually meaningless.
The only thing that is worse than the compilation of such rankings is the
use of these rankings for business decisions.
The problem here is not a problem of arithmetic. It is not a problem of
not knowing how to manipulate numbers but rather in not knowing how to interpret
them. All the arithmetic, all the algebra, all the geometry, all the trigonometry
and all the calculus you have ever had was taught in the world of pure numbers.
This world is one where lines have no width, planes have no thickness and
points have no dimensions at all. While things work out very nicely in this
world of pure numbers, we do not live there.
Numbers are not exact in the world in which we live. They always contain
variation. As noted above, there is variation in the way numbers are generated.
There is variation in the way numbers are collected. There is variation
in the way numbers are analyzed. And finally, even if none of the above
existed, there would still be variation in the measurement process itself.
Thus, without some understanding of all this variation, it is impossible
to interpret the numbers of this world.
If a manufacturer applies two film coatings to a surface, and if each coating
is two microns thick, will the combined thickness of the two coatings be
exactly four microns thick? If we measure with sufficient care and precision,
the combined thickness is virtually certain to be some other value than
four microns. Thus, when we add one thing that is characterized by the value
2.0 to another thing characterized by the value 2.0, we end up with something
which is only equal to four on the average.
What we see here is not a breakdown in the rules of arithmetic but a shift
in what we are doing with numbers. Rather than working with pure numbers,
we are now using numbers to characterize something in this world. When we
do this, we encounter the problem of variation. In every measurement, and
in every count, there is some element of variation. This variation is connected
to both the process of obtaining the number and to the variation in the
characteristic being quantified. This variation tends to "fuzz"
the numbers and undermine all simple attempts to analyze and interpret the
numbers.
So how, then, should we proceed? How can we use numbers? When we work with
numbers in this world, we must first make allowances for the variation that
is inherent in those numbers. This is exactly what Shewhart's charts do-they
filter out the routine variation so that we can spot any exceptional values
which may be present. (One way of doing this was described in this column
last month.) This filtering, this separation of all numbers into "probable
noise" and "potential signals" is at the very heart of making
sense of data. While it is not good to miss a signal, it is equally bad
to interpret noise as if it were a signal. The real trick is to strike an
economic balance between these two mistakes, and this is exactly what Shewhart's
charts do. They filter out virtually all of the probable noise, so that
anything left over may be considered a potential signal.
Whether or not you acknowledge variation, it is present in all of the numbers
with which you deal each day.
If you choose to learn about variation, it will change the way you interpret
all data. You will still detect those signals that are of economic importance,
but you will not be derailed by noise.
If you choose to ignore variation, then for you, two plus two will still
be equal to four, and you will continue to be misled by noise. You will
also tend to reveal your choice by the way you talk and by the mistakes
you make when you interpret data.
Two plus two is only equal to four on the average. The sooner you understand
this, the sooner you can begin to use numbers effectively.
About the author . . .
Donald J. Wheeler is an internationally known consulting statistician
and the author of Understanding Variation: The Key to Managing Chaos and
Understanding Statistical Process Control, Second Edition.
© 1996 SPC Press I