In my column of Jan. 7, “The Right and Wrong Ways of Computing Limits,” I looked at the problems in computing limits for average charts. This column will consider the right and wrong ways of computing limits for charts for individual values. As before, a data set will be given that you can use to evaluate your own software so that, hopefully, you can select an option that results in the correct limits. But first, some history might be of interest.
The chart for individual values
On page 314 of Walter A. Shewhart's Economic Control of Quality of Manufactured Product (Van Nostrand, 1931 original publication; American Society for Quality Control, 1980 commemorative reissue) wrote:
…
Comments
Great article
Another informative and eloquent article from the master, which stands in marked contrast to some of the six sigma and hypothesis testing, based rubbish on control charts published in this journal recently.
Autocorrelated data
Great article,
I am convinced that Control Charting is the most effective way to monitor data & process control however I am facing problems to track my autocorrelated data from a chemical system which is inflenced by external assignable causes. It is impossible to eleminate these external causes because they are linked to periodic, unpredictable production and raw material variations.
I am sure that the moving ranges of my succesive analysis data are not correctly reflecting routine variation because the small differences are induced by assignable sources (which are part of the longer term routine variation). So when constructing an individual XmR chart on these data the points just are swinging out of the limits all the time..Trying to use a moving average chart yields too low sensitivity and is no solution as well..
My question is: what is the right way for subgrouping autocorrelated data as in cases described above, yielding an effective estimation of routine variation for computing the process control limits?
Sincerely thanks for your inputs!
Frank
Autocorrelated Data
I have written a chapter on dealing with autocorrelated data. It is in my Advanced Topics in Statistical Process Control book. Send me an e-mail and I will respond more completely. Know this, autocorrelation is simply another way a process has of telling you that it is changing.
Hope this will help.
Sigma
I do think that the "problems" with using the statistical estimate of the standard deviation need to be revisited. Some things to consider:
1. The example provided by Dr. Wheeler has a run of 7 points in a row below average, which by many authors (but not Dr. Wheeler, who uses 8 points) to be a signal, which would cause that data not to be put all into the same baseline.
2. In 16 years of operational SPC, I have yet to run into an example from real world data where the moving range and the statistical standard deviation provide two different interpretations of what is and what is not an outlier.
3. With modern spreadsheets, the statistical standard deviation is easier to calculate rather than the moving range conversion. That is the opposite of the situation for pre-computer days, when the moving range was developed.
4. Dr. Shewhart himself, in Economic Control of Quality of Manufactured Product evaluates several methods for determining the spread of the data, and he rates the statistical standard deviation as best (better than the range).
5. Dr. Shewhart and other authors do invoke the Tchybychev Inequality as the theoretical basis for SPC. The Tchybychev Inequality is non-parametric and uses the statistical estimate for the standard deviation. The moving range formula (2.66 times the average moving range) has the Normal distribution built into it as an assumption.
Charts Done Right....
Most authors that I've been exposed to use 8 consecutive points to indicate a special cause (run). I believe L.S. Nelson uses 9 points. I guess it depends on how exploratory or conservative you want your analysis to be. Since there is usually another data point around the bend why not wait?
I believe the issue here is using a method to calculate an estimate of standard deviation that minimizes the effect of signals even though the baseline data contains signals. What Wheeler states as "getting good limits from bad data). We're not looking for "perfect" limits but only useful limits that allow us to take the appropriate action.
reply to Sprevette
1. We do not have to prequalify our data prior to placing them on a chart. These are simply the first 16 values out of Shewhart's larger data set. They are sufficient to show that the process is changing.
2. It only takes one counterexample to shoot down such a theory. The example provided is a real world data set where the two computations give different results.
3. Yes, lazy bones can compute the standard deviation easily. Unfortunately, this computation makes a strong assumption of homogeneity, which is essentially what we want to examine.
4. Dr. Shewhart did compare the root mean square deviation with the Range and correctly noted that there is a difference in their efficiency when the subgroup size gets large. He also notes that the differences are minimal when n is less than 10. In 1935 he was instrumental in writing the ASTM Supplement B which gave the scaling factors for using the range, so he was okay with this approach.
5. If you read Shewhart very carefully you will find that he rejects the Chebychev approach as not being sufficiently general to do what he wanted to do.
In general, the choice of statistic is not the issue, but rather the computation of within-subgroup measures of dispersion versus the computation of global measures of dispersion. You simply cannot change the underlying mathematics, no matter how many times you may read things to the contrary.
Hope this will help.
Application of Nelson's Rule
I entered your data into a software program and it calculates a different average from the one on your R chart (their 257 vs. your 356). The author claims it is because Nelson's rule automatically removes any data point more the 3.5 times the mean from that calculation, so it removes the final difference before calculating the R chart mean. Your use of the median instead of the mean in the second set of charts has a similar effect, though not as extreme. Is there a justification for ignoring outliers when calculating the mean?
Thanks.
deflation of control limits by using global dispersion measure
In an attempt to advocate the calculation of control limits of XmR charts by avrg. mR * 2,660 over the use of a global dispersion measure (the global standard deviation) I evaluated sets of our real world data (quality characteristics measured as one value per batch) and compared limits for both ways of calculation. What i found was limits beeing inflated as well as limits beeing deflated in some cases. I checked my calculations and couldn`t find a possible error so far.I could also not find the cause for deflated limits.Can deflation of the limits happen?Is it also "only" due to inhomogeneity of the data or is there a more specific reason behind it?I suspected short periods of decreased dispersion in the data from visual examination of the run charts but couldn`t find that systematically nor pin it down mathematically
All my arguments seem to be obsolete now because i claimed, that limits could be inflated in the first place and now have to admit, that deflation of limits is also possible and in fact it might be seen as an optional choice for favouring the method of calculation giving wider limits in the majority of cases over the other one.
I would be really thankful about hints on this issue!
Response for HMINDLER
I would be glad to discuss this directly with you.
Your results might be due to your approach.
If the data come from a random number generator, then what you report may well be true.
But when comparing limit computations using non homogeneous data, the use of the average moving range
will result in tighter limits than any approach using a global measure of dispersion.
You can contact me at djwheeler@spcpress.com
Distribution of values
Shall, the values, be distributed according to a normal distrubution to use these limits or they are valid even for Weibull or other distributions?
As Dr. Wheeler documents in
As Dr. Wheeler documents in some of his writings, the XmR methodology (a.k.a. "process behavior charts") is valid regardless of the underlying distribution of the data (or even if it doesn't fit well with a distribution), which makes it extremely useful and applicable for real-world data.
Add new comment