spctoolkit
by Donald J. Wheeler
Teens Who Smoke
"Teen Use Turns Upward" read the headline for a graph appearing
in USA Today on June 21, 1994. The data in the graph were attributed
to the Institute for Social Research at the University of Michigan and were
labeled as the "percentage of high school seniors who smoke daily."
The portion of this graph covering the past 10 years is shown in Figure
1.
Each point is the value found in an annual survey. The 1993 value of 19
percent was higher than the 1992 value of 17.3 percent. This was interpreted
to mean that more teenagers are using tobacco now than in the past. But
are they?
Before we can make sense of numbers like these, we need to know something
about the limitations of data from surveys. First, values like these are
subject to variation. Two identical surveys carried out at the same time
will rarely yield identical results. Among the sources of variation for
survey data are differences in who is interviewed, how they are interviewed,
how they respond and how their responses are reported. Second, when a survey
is used from year to year, there is also the problem of different personnel
being used to conduct the survey and differences in how the questions are
perceived in different years.
Therefore, no matter what is being measured, and no matter how carefully
it is measured, the statistics will always vary. Even if nothing changes,
we can expect the value to go up about half the time, and we can also expect
the value to go down about half the time.
So how do we ever detect a change using survey data? If we interpret each
and every change in the percentage who smoke daily as a year-to-year difference,
how do we know that we are not being misled by the study-to-study variation?
If we admit that there is study-to-study variation, how then do we ever
know when there has been a change from one year to another?
The answer is that we must first filter out the study-to-study variation,
and then look for year-to-year differences. The simplest way to do this
is with a control chart.
We begin with the yearly values. For the data on high school students who
smoked daily, the annual percentages reported for 1984 through 1993 were,
respectively: 18.8, 19.6, 18.7, 18.6, 18.1, 18.9, 19.2, 18.2, 17.3, 19.
The average of these 10 values is 18.64 percent.
This average is used as a central line, and the 10 values are plotted as
a time series as shown in Figure 2. This graph is the beginning of the control
chart for individual values (also known as an X-chart).
Because the variation between one year's value and the next will always
include the study-to-study variation, we use the year-to-year variation
as our guide to how much uncertainty is inherent in the reported results.
These year-to-year changes are measured by the differences between successive
values (these differences are called moving ranges). The nine moving ranges
for these data are: 0.8, 0.9, 0.1, 0.5, 0.8, 0.3, 1.0, 0.9, 1.7. The average
moving range is 0.778 percent. We use this average moving range to compute
limits for the previous graph.
The limits for our X-chart are commonly known as natural process limits.
They are placed symmetrically on either side of the central line. The distance
from the central line to these limits is found by multiplying the average
moving range by 2.660. This value of 2.660 is a constant that converts the
raw statistic into the appropriate measure of dispersion.
For these data, this distance is:
2.660 x 0.778% = 2.07%
Thus, the upper natural process limit is:
18.64% + 2.07%= 20.71%
The lower natural process limit is:
18.64% ­p; 2.07% = 16.57%
These limits make allowance for routine variation. They are added to the
graph to obtain the X-chart, shown in Figure 3.
Before a yearly value can be said to represent a change in the use of tobacco
by teenagers, it will have to either exceed the upper limit or fall below
the lower limit. Since none of these values fall outside these limits, any
statement about changes in the percentage of teens who smoke is questionable.
But wait-the change between the last two values, where the percentage jumped
from 17.3 percent to 19 percent, represents the biggest change during the
past 10 years. Surely this should mean something.
To see if this is the case, we can place the moving ranges on a control
chart. The average moving range of 0.778 will be the central line, and the
upper limit will be found by multiplying the average moving range by the
constant value of 3.27. This results in an upper limit for the moving ranges
of 2.54 (see Figure 4).
The last value on this moving range chart shows the "jump" between
1992 and 1993. This moving range of 1.7 percent does not fall above the
upper limit of the moving range chart. Thus, once again, the "jump"
from 17.3 percent to 19 percent does not qualify as a clear-cut signal.
So, what can we say about the percentage of teenagers who smoke daily? Just
this: There is no evidence that the percentage of teenagers who smoke has
increased. Neither is there any evidence that this percentage has decreased
in the past 10 years. The only headline for these data that has any integrity
is, "No Change in Teen Use of Tobacco." Anything else is propaganda.
So how do you avoid being persuaded by propaganda? Start by realizing that
while all data contain noise, only some data contain signals. If you don't
know how to separate the probable noise from the potential signals, you
are susceptible to being misled by the noise in the data. Others may use
data to mislead you-or you may even mislead yourself. Shewhart's charts
are the simplest way to separate signals from noise.
By the way, did you read the article about how the trade deficit soared
last April? Oh, well, that's another story-or is it?
About the author . . .
Donald J. Wheeler is an interna-tionally known consulting statistician
and the author of Understanding Variation: The Key to Managing Chaos
and Understanding Statistical Process Control, Second Edition. ©
1996 SPC Press Inc. Telephone (423) 584-5005.