Author clarification--3/5/2015:
It appears that I somewhat overstated my case in this article. I had forgotten that there are some families of distributions where we can estimate the shape of a probability model using the statistics for location and/or dispersion. Because of this, these families get used a lot when fitting models to data, which is rather like the drunk looking for his car keys under the street light, not because that is where he lost them, but simply because that is where the light is.
Shewhart explored many ways of detecting process changes. Along the way he considered the analysis of variance, the use of bivariate plots of location and dispersion, and the idea of probability limits. In the end he settled on the use of a generic approach using symmetric three-sigma limits based on a within-subgroup measure of dispersion. This article updates and expands Shewhart’s arguments regarding probability limits.
ADVERTISEMENT |
…
Comments
Modelling "when to take action"
Dr. Wheeler,
Thanks for a very well-written and informative article. I have often wondered about why only the normal distribution model is used in control charting (or process behavior charting). This indeed looks like a pragmatic approach embraced by engineers and not statisticians. it seems that we often want to over-complicate things, but often, from a practical perspective, it's not necessary. BTW, I attended Bill Scherkenbach's workshop and he often says what you said in the article, "the only reason to collect data is to take action". I would contend that there are other reasons such as determining when action is needed and what kind of action is needed. Charting can definitely help with that too.
Very clear article-for those with an open mind
If my memory selves me well, I remember a DEN (Deming Electronic Network) post by Don many years ago explaining how Shewhart had turned the traditional statistical model on it's head to set the process limits. Like his latest article, it was very enlightening. Deming said in 1980 that it may well take 50 years for people (read statiticians) to "get" Shewhart. I think he was off by a lot!
Rich
Great point, Rich!
Your point is very well-taken, Rich. I think Deming was so optimistic because he expected that--with all the momentum at the time--universities would begin to teach courses that included analytic studies and SPC. It just hasn't happened. Statistics is still firmly entrenched in the school of mathematics, and business statistics courses are mostly about tests of hypotheses, how many transistors will be defective if I select three from a box containing 17 good ones and 8 bad ones, and (in the more advanced ones) risk analysis and correlation. How, then, do we advance these ideas?
I have often thought that--while there are still some alive--we should get some of the most-recognized people in this milieu to form an "Analytic Studies" association, with its own peer-reviewed journal.
When you MUST use probability limits
"But characterizing the location and dispersion is not enough to specify a particular probability model" is entirely correct, You can, for example, estimate the shape and scale parameters for a gamma distribution from its average and standard deviation, but it is better to use the maximum likelihood estimate, which is how StatGraphics and Minitab do it. It is also how I learned to do it in a graduate course on reliability statistics.
This is of course not a criticism of Deming or Shewhart; to say they were "wrong" for not doing this would be like asking why Joseph Lister (the father of antiseptic surgery) did not also use fiber optic surgery rather than open incisions, or asking why Edward Jenner did not develop a polio vaccine to go with his smallpox vaccine. The technology was simply not available at the time, and these two doctors (like Deming and Shewhart) did better than any of their contemporaries with the technology that was available. Now, however, we have the technology, so we should leverage it where appropriate, and Dr. Wheeler's article points out correctly that it is not univerally appropriate.
If you haven't a clue as to what the real distribution is, you can't fit a valid model. I share Dr. Wheeler's dislike of just using the variance, skewnesss, and/or kurtosis to fit some kind of artificial model, and I don't really trust the Johnson distributions--the model may fit, but you don't really know what you are getting. Remember that tests for goodness of fit never PROVE you have a normal distribution, or whatever distribution you are trying to use; all they can do is prove beyond a reasonable doubt (the Type I risk) that the distribution is a poor fit.
Also, if you have a bimodal distribution, or other situation in which assignable causes are operating, you can't set up a SPC chart, or quote a process performance index, because you already know you do not have a stable process. That is, if I have a bimodal process, I don't need a point outside a control limit to tell me there is an assignable cause. If the data are riddled with outliers, some if not all of them also will be outside the control imits but, again, I already know there is a problem. Assignable causes are present that must be removed before I can say anything about the process parameters, or the capability.
If, on the other hand, you do know the underlying distribution, e.g. from experience--it is known in the electronics industry, for example, that failure times follow the exponential distribution when the hazard rate is constant, and there is good evidence that impurities follow the gamma distribution--you MUST use that distribution to estimate the process performance index, at least as far as the AIAG's SPC manual is concerned. The consequences of using the normality assumption for an SPC chart might be a false alarm risk 10 or 20 times the expected 0.00135 at one of the control limits, so the worst that will happen is that you chase false alarms, but the consequences of using it to estimate the nonconforming fraction is being off by orders of magnitude--as in, "Your centered 'Six Sigma' process is sending us 500 defects per million opportunities!" (versus one part per billion at each specification limit, if there are indeed two limits).
The EPA also seems very focused on the use of the gamma distribution to model environmental data. http://www.epa.gov/osp/hstl/tsc/ProUCL_v3.0_fact.pdf
"ProUCL also computes the UCLs of the unknown population mean based upon the positively skewed gamma distribution, which is often better suited to model environmental data sets than the lognormal distribution. For positively skewed data sets, the default use of a lognormal distribution often results in impractically large UCLs, especially when the data sets are small." ProUCL may in fact be free, but I couldn't install it on my last computer due to an older operating system.
The AIAG manual also discusses transformations, but I do not trust them--especially when we talk about ppm or parts per billion (Six Sigma, no process shift) nonconforming fractions. The transformations actually work better when quality becomes worse, just like normal approximations for defects and nonconformances. I did a semilog plot of the nonconforming fraction versus the process performance index for the gamma distribution and the cube root transformation (which is recommended for the gamma distribution), and the two lines divereged visibly in the low ppm and high ppb regions. In the parts per thousand region, though, they were essentially indistinguishable. Parts per thousand nonconforming is also a non-capable process.
Since you MUST use the actual distribution (again stipulating that it can be identified from past experience and/or the nature of the process, and also that it is then not rejected by tests for goodness of fit) to estimate the nonconforming fraction, why not use it to set SPC control limits as well?
Really?
In the fourth paragraph of your commentary you agrue that we cannot create a process behavior chart unless our process is already being operated predictably. If that were true, we would not be writing about this topic. (I call this Myth Four.) The objective of a process behavior chart is not estimation, but rather the characteriztion of the process behavior. We can make this characterization with imperfect limits, and the computations are sufficiently robust that we can get usable limits from imperfect data. In the words of Shewhart in his rebuttal of E. S. Pearson on exactly this point: "We are not concerned with the functional form of the universe [i.e. the probability model], but merely with the assumption that a universe exists [i.e. that the process is being operated predictably]."
I once thought I knew what SPC was all about. I had read Grant. I had taught the course. Then my mentor told me that I needed to read Shewhart. I did, and discovered there was more to it than I had thought. Then about a year later when I thought I really knew what SPC was all about, my mentor told me I needed to reread Shewhart all over again. I ended up rewriting my class notes five times in five years as I kept discovering that SPC was much more profound than it appears at first. This is why I am convinced the greatest obstacle to understanding SPC is an education in statistics. Other statisticians have written me that they have had the same experience.
My experience
Don,
I suppose I should describe my own experience, noting especially "the greatest obstacle to understanding SPC is an education in statistics." My original education was in chemical engineering, where I learned the obvious desirability of feedback process control. These controls are automated and, because they work on continuous processes, are generally better than SPC.
Then I worked for IBM, which made discrete computer parts rather than products that flow and pour. I got a night school MBA (initially, and then also an M.S. in applied statistics). When I learned about SPC, I was delighted. "I can use this on discrete processes the way feedback process controls work for continuous processes!" I eagerly took what I had learned in the classroom to the factory, only to discover that the data were like nothing in the textook examples. They didn't follow a bell curve, and too many points were outside the control limits. I did, however, have to take two courses in statistical theory--I never liked theory, but the presentation of distributions other than the normal caught my attention very quickly. It was clear that they could be used to model continuous distributions that were not bell-shaped. I also recall running up against the rational subgroup issue at IBM, and realizing that you have to sort the within-batch variation from the (usually larger) between-batch variation. When I went to Harris Semiconductor in 1993, I found that the people there were already addressing this issue.
My understanding is that the normality assumption is generally robust enough for practical purposes--and even a gamma distribution with a sufficiently large shape parameter will, I think, begin to look like a bell curve due to the central limit theorem. So 3-sigma limits are admittedly viable even for many non-normal situations as far as SPC is concerned, and even more so when a sample average is involved.
I don't see any way around the use of the actual distirbution (again, if it can be identified) to get the process performance index, though. Remember that, for SPC, we are talking about a nominal false alarm risk of 0.00135. Nonnormality may increase this risk by a factor of 5, 10, or maybe 20. The worst that will happen is that we chase more false alarms than we expect. For the nonconforming fraction, on the other hand, we are dealing with ppm or, ideally, ppb, and this is the region in which the effects of non-normality really begin to make themselves felt--as in 1000, 10,000, or even 100,000 times as many nonconformances as the normal distribution predicts. That is, we can have a nominally Six Sigma process (Ppk = 2.0) that is in fact not even capable (Ppk < 1.33) in terms of the nonconforming fraction.
Distributions everywhere and not a drop....
If I recall my Stat 101 courses from 25 years ago, the two dozen or so most-used distributions ALL have certain assumptions which define the conditions of the distribution. I have yet to find the most-used distributions collected in one place with ALL their assumptions listed and explained. I accept that some distribution assumptions would change with each parameter change. That explanation would be interesting.
Statisticians always want to assume Normal. Why? Why not Gamma as Levinson writes? Shewhart stuck in my mind because he did not assume normal. I have read that some control charts assume Binomial and some assume Exponential. What are the listed assumptions? What do their useage buy you as opposed to using no distributions?
Assumptions
The p, np, c, and u charts all assume that nonconformances and defects follow the binomial and Poisson distributions respectively. This is a scientifically justifiable assumption. E.g. the chance of one item being nonconforming is reflected by the Bernoulli distibution, as I recall, and the binomial is simply an expression for numerous Bernoulli distributions. In addition, the hypergeometric reduces to the binomial as the population becomes infinite. That means that, if your process is stable, these distributions will indeed reflect common cause attribute data.
The normal approximations for these distributions (the basis for the traditional control chart), by the way, improve as quality gets worse: no fewer than four, and preferably five or six, expected nonconformances or defects in each sample.
Clarification
It appears that I have somewhat overstated my case in this article. I had forgotten that there are some families of distributions where we can estimate the shape of a probability model using the statistics for location and/or dispersion. Because of this these families get used a lot when fitting models to data, which is rather like the drunk looking for his car keys under the street light, not because that is where he lost them, but simply because that is where the light is.
Add new comment