A normal distribution is far more common in statistical textbooks than it is in real-world processes, and untold grief results from the unquestioning assumption that all manufacturing processes follow the bell curve. The grief consists specifically of:
1. Out-of-control signals that send production operators to look for assignable causes that aren’t there. This convinces the shop floor that statistical process contrtol (SPC) is a time-wasting and futile exercise.
2. Statements of process capability that bear little relation to the process’s actual nonconforming fraction
ADVERTISEMENT |
This article will give an example of how to recognize such situations along with a brief overview of how to make SPC work for them.
No, it’s not out of control
Consider a simulated process in which the critical-to-quality characteristic is a trace impurity level in parts per million. The traditional Shewhart charts in Figure 1 (subgroups of two) suggest that it is out of control.
…
Comments
Let the dueling articles begin
Just had the other week Davis Ballestracci's and Don Wheeler's assertions that normality is not required for SPC (which is backed up by reading Dr. Shewhart's original works) and now we are back to asserting normality is required. How about some consistency in editing this magazine?
A variety of views
SPREVETTE, Thanks for your comments. When it comes to how to deal with normality, there are a LOT of viewpoints. Our job is not to choose sides in this debate (and it is a debate) but simply to present views from both sides and let the readers decide who presents the strongest argument.
Thanks
The article is rubbish. Six Sigma trash.
Read Wheeler to understand the basics.
Let the dueling articles continue
I feel no need to call anyone names. I will simply quote Shewhart
we see, however, that even when the distribution is known, the distribution function f(theta,n) for a given statistic theta is seldom known in sufficient detail to make it possible to choose theta1 and theta2 [UCL and LCL] to cut off equal tails. Even more important is the fact that we seldom care to specify f accurately enough to make possible the setting of such limits.
For these reasons, we usually choose a symmetrical range characterized by the limits [avg] +/- t sigma symmetrically spaced in reference to [avg]. Tchebycheff's theorm tells us that the probability P that an observed value of theta will lie within these limits so long as the quality standard is maintained satisfies the inequality P > 1 - 1/t^2.
We are still faced with the choice of t. Experience indicates that t=3 seems to be an acceptable economic value.
====
This is on page 277 of the 50th anniversary edition of Economic Control of Quality of Manufactured Product.
Leptokurtophobia?
Normality is indeed not required for SPC if one sets control limits whose basis is the underlying non-normal distribution. Normality (whether in the underlying distribution or through the Central Limit Theorem) is however required if one uses control limits that assume a normal distribution. Normality is definitely required if one calculates a process capability or process performance index in the traditional manner and then expects it to reflect the nonconforming fraction accurately.
I looked up my copy of the Western Electric Handbook, which I understand was influenced by Shewhart. It acknowledges the existence of nonnormal distributions and suggests that limits other than 3 sigma might be appropriate for them. It does not go into detail but the book was written in 1956, where one would have had to find a mainframe computer to do what is now routine on a personal computer. In addition, all SPC references say that a minimum expected count of 4-6 nonconformances or defects is necessary for the traditional attribute (p, np, c, and u) charts to work properly. This is because the control limits rely on the normal approximations to the binomial and Poisson distributions so this requirement is an explicit statement that normality IS required (in this case achieved by something very similar to the CLT) for SPC in these cases. In other words, when the normal approximation DOES work for attributes, the process is generating on average four or more undesirable events per sample. It is trivially easy, though, to use spreadsheet functions to use the exact Poisson or binomial distribution for attributes.
It would probably be instructive to do some case studies with reader-supplied (non-confidential and non-proprietary) data to compare the performances of charts that rely the normality assumption and those that use the underlying non-normal distribution. It is especially important to compare process capability or process performance estimates.
Dr. Wheeler Fails to Address Normality Assumption Pitfalls
There are several misinformed followers of Dr. Wheeler who are rabid haters of this article and Six Sigma. It is mainly due to their severe lack of education in statistics and specifically, a shallow or closed mind. William Levinson is not the only individual who is aware of the fallacies of assuming all data is normally distributed; he knows that a fitting a normal distribution to skewed data can have bad repercussions. Fitting normal distribution to non-normal individuals data is the wisdom of a fool. World reknown professors like Dr. Douglas Montgomery, Dr. George Runger, Dr. Breyfogle, and others have written several textbooks and professional articles indicating that false alarms occur when computing control limits for non-normal individuals SPC charts. In addition, these men have written articles showing that false values for Cp and Cpk can result from using the assumption of normality for non-normal data. Likewise, faulty calculations of Yield and the Probability of Nonconmpliance (PNC) can occur by fitting a normal distribution to non-normal data. All these apply to INDIVIDUALS type variable data with NO SUBGROUPS. Individuals type SPC charts are sensitive to non-normality. Dr. Wheeler avoids addressing this topic: the effects of assuming Normality on Non-Normal data. I've read parts of a book Dr. Wheeler wrote on individuals type data and he miserably fails to address the pitfalls of assuming non-normality in individuals data. His student followers should have the cajones to challenge Dr. Wheeler on this and demand that he explain himself on this issue IN SUFFICIENT DETAIL (and not hide). There have been previous articles written in Quality Digest by Dr. Breyfogle who provided examples to show proof. I have written several examples proving potential problems occur when creating individuals SPC chart under the assumption of fitting non-normal data with a normal distribution.
David A Herrera
Dueling Articles
I am neither a rabbit/rabid hater/hatter [four possible puns there! ;-)] of the article or Six Sigma. However, when I saw the preview to this article a day or two ago, I did sigh heavily - knowing what the aricle would contain. The only real problem I have with Six Sigma is that sometimes the statistical approach that Six Sigma traches often makes the problem more complex than it really needs to be. "The simplest analysis that gives you the insight needed is the best analysis."
As for cajones, I used mine to actually read (and re-read) and try to understand Dr. Shewhart's "Economic Control Of Quality Of Manufactured Product" and "Statistical Method From The Viewpoint Of Quality Control". In my humble opinion, these works should be required reading for anyone who would deign to consider themselves knowledgeable on the subject of control charts. One thing that is clear, even to me: Control charts in general and control limits in particular are NOT predicated on the assumption of the normal distribution of the data. Dr. Shewhart clearly set the control limits (natural process limits) to minimize the total costs of alpha and beta error. Inherently, this means that sometimes there are false signals.
As for the statement that Dr. Wheeler has not explained himself and is hiding, perhaps one should peruse his book "Normality And The Process Behavior Chart". In this work, 1143 distributions from 9 families of distributions with varying parameters were studied and clearly shown to be compatible with Process Behavior Charts without transformation.
There ARE certain types of control charts that do require the data to conform to a particular probability distribution. For example, C and U charts are applicable to Poisson distributed data. That being said, the I-mR chart handles this data just fine as well.
Finally, a word about negative control limits... With little taxing of the grey matter, one can conclude that negative control limits make sense with some data sets and not for others. When a negative limit makes no sense, then the lower control limit is ZERO! Generally, the software will allow you to make an adjustment to the chart to show zero as the LCL.
Whose Ignorance?
Your remarks about Dr. Wheeler only go to show your ignorance on his work. If you were even remotely familiar with his books, you would know that he has written much on the topic of the normality regarding SPC charts. He has even written a whole book on this subject. It is no surprise that Dr. Wheeler was a favorite of Dr. Deming's since he has deep knowledge and understanding of Shewhart's work, something you obvious lack.
Rich DeRoeck
Wheeler on Leptokurtophobia
I have read what Dr. Wheeler has written on this subject at http://www.qualitydigest.com/inside/quality-insider-column/do-you-have-…. The first page shows two distributions for which the 3-sigma upper control limit delivers false alarm risks of 1.4% to 1.8%, ten times as much as the expected 0.135%. This is for individuals and the risk for a sample average would be lower due to the Central Limit Theorem but there are applications in which the rational subgroup is in fact 1 and cannot be increased. Page 4 then demonstrates the futility of a normalizing transformation for a bimodal distribution; this proves only that the process does not meet the prerequisite of being in control before application of SPC. Assignable causes are obviously present so they must be removed before any control charts, whether of transformed or untransformed data, will be useful. If this process were in control, though, a beta distribution might work because this is the model for activity completion times in project management--the normal approximation works in PERT because the sum of a large number of beta variables approaches normality. The beta distribution might therefore work for transit times as well, or the in-control process' beta distribution might approach normality sufficiently to allow use of traditional Shewhart charts. If the process is in control, though, transformations are appropriate and effective, and they are used all the time in Design of Experiments. (Note that DOE requires tests for normality of the residuals; if they aren't normal, one must either use a transformation or fall back on a nonparametric method). Finally, "Do You Have Leptokurtophobia" does not address the issue of process performance indices. The Automotive Industry Action Group's SPC manual offers the approach I describe as one option: fit the distribution and calculate the nonconforming fraction at each specification limit.
Flaws in Fitting Normal distributions to Non-Normal Data
Gentlement: Several recent replies here agree that fitting a Normal distribution to Non-Normal INDIVIDUALS type data has bad repurcussions except for the hardcore disciples of Dr. Wheeler. If a probability chart clearly indicates a bad fit by a Normal distribution to Non-Normal data, you will get false alarms, erroneous Yileds (and PNC), and Cp, and Cpk. Several here on this post understand that, but Wheeler fans do not. I am NOT talking about attribute data consisting of C or U charts or discrete data. I am NOT talking about SPC charts with subgroups greater than 1. I am NOT suggesting to transform the data into some form either (like a Box-Cox transformation). I AM SAYING that the best way to compute Yield, PNC, Cp, Cpk and control charts is to FIT the BEST probability distribution to model the data, be it Normal or Non-Normal, using Deming's 3-sigma limits It appears that die-hard Wheeler fans are like liberals, they only look at statistics through their "liberal" lenses and refuse to look through any other lenses that may contain truth: it's called closed mindedness. Yet Dr. Wheeler himself has not addressed the problems that world reknowned statisticians like Dr. Montgomery, Dr. Runger, Dr. Breyfogle, and many others have expressed about problems caused by fitting normal distributions to Non-Normal Individuals-type data. I don't care if Dr. Wheeler was a favorite student of Shewart or Mahatma Ghandi, I challenge Dr. Wheeler to explain himself on this SPECIFIC ISSUE. To the individual who quoted Dr Wheeler studied 1100+ distributions: show me that reference, the article, or the book he wrote about them all. Are there really 1100+ distributions in statistical literature??? Sounds like an exaggeration. The response should come from Dr. Wheeler, not opinionated six-sigma haters or those who hate statistics. In case anyone forgot, we are in the 21st Century, not the 20th Century. Statistical knowledge evolves and improves. Many individuals like Shewart created important bases of knowledge in the 20th Century, but new and better statistical knowledge evolves; we have better computers and better technology that drive us forward to solve engineering and manufacturing problems, not drive us backwards into the 1930's.
David A. Herrera
Book Reference
Normality and the Process Behavior Chart
Having a normally shaped histogram is not a prerequisite for placing your data on a process behavior chart. Neither is it inevitable that a predictable process will display a mound shaped histogram.
This book provides the first careful and complete examination of the relationship between the normal distribution and the process behavior chart. It clears up much of the confusion surrounding this subject, and it can help you overcome the superstitions that have hampered the effective use of process behavior charts.
Topics include:
You can also visit his website (www'spcpress.com) to read the many papers he has written about this (and other) subject regarding SPC. Now assuming YOU are not one of those "narrow-minded liberals" and you read these articles with an open mind, you just might see a different perspective.
"Normal distribution? I never saw one."
To assert that a process behavior chart should have a false alarm rate of 0.135% is, in itself, evidence of not understanding Shewhart's wonderful tool. Again, the assumption of a normal distribution has NOTHING to do with a process behavior chart. The choice of 3-sigma limits by Shewhart was NOT based on probabilities. In fact, Deming once said (paraphrased), "Probabilities should not be applied to control charts. No, no, that will never do." In 1991, I attended one of Deming's 4-Day seminars in which he stated, "Normal distribution? I never saw one." At the time, I thought he was crazy! Since then, I have learned much, thanks to many writers who have presented Deming's wisdom in a manner in which mortals like me can comprehend (to some degree).
BTW, the "Goodness-of-fit test" was called "Lack-of-fit test" (the proper name) many years ago when I was first learning statistics. You really cannot correctly claim a process is generating outputs which definitively fit any theoretical distribution. You can only say with certainty what theoretical distributions that data does not conform to. Many data sets look "normal" , but are not. Many data sets "fit" more than one distribution. Which is correct? Who knows. Keep collecting data. Eventually, the data will indicate a lack of fit. Then what?
Another Deming quote:
Comment from reader
This comment was posted by Quality Digest for reader John Flaig.
In practice, the chart is likely to be an individuals (X) chart because the critical quality characteristic is often from a product like a chemical or materials batch, or a measurement such as particle counts in semiconductor manufacturing equipment, for which the rational subgroup is one. (Multiple measurements in such cases are merely replicates of the same individual and do not reflect between-batch or between-setup variation.) n=2 was used primarily to allow illustration of the effect on non-normality on the range chart.
Mr. Levinson's statement has a lot of things wrong with it:
1. Not all processes where an x-chart is appropriate have a high degree of autocorrelation. Contrary to what I think he is implying.
2. In any case, observations from such processes are NOT replicates. Contrary to what he says.
3. The logic for using the two point moving average has nothing to do with illustrating the effect of non-normality on the range chart. It is an estimator of the within subgroup homogeneous caused variation, not the between batch as the author states. I suggest the author look up a 1943 paper by Dr. John von Neumann as it will clarify his multiple misunderstandings.
John J. Flaig, Ph.D.
Fellow of the American Society for Quality
Add new comment