Whenever the original data pile up against a barrier or a boundary value, the histogram tends to be skewed and non-normal in shape. Last month in part one we found that this doesn’t appreciably affect the performance of process behavior charts for location. This month we look at how skewed data affect the charts for dispersion.
In practice, we get appreciable skewness only when the distance between the average and the boundary condition is less than two standard deviations. A careful inspection of the six distributions shown in figure 1 will show that this corresponds to those situations where the skewness is in the neighborhood of 0.90 or larger. When the skewness is smaller than this, the departure for normality is minimal, as may be seen with distribution number 15 in figure 1.
The usual formulas for finding limits for a range chart involve the scaling factors known as D3 and D4.
Lower Range Limit = D3 × Average Range
Upper Range Limit = D4 × Average Range
These scaling factors depend on the bias correction factors, d2 and d3.
…
Comments
R charts with exact control limits
If the underlying distribution is known (e.g. as tested with goodness of fit tests), it is possible to set control limits for the R chart with known false alarm risks. Skewness and kurtosis are essentially useless for this, and Dr. Wheeler's article reinforces this perception. (I have personally not used skewness or kurtosis for anything since learning about them in night school.) The correct approach is to fit the distribution parameters via a maximum likelihood method. Minitab and StatGraphics will do this for a wide variety of distributions such as the Weibull and gamma distributions.
Wilks, S.S. 1948. "Order Statistics." Bulletin of the American Mathematical Society 54 (1948), Part 1:6-50 then provides an equation for the distribution of the range of the distribution in question. You can then calculate, for example, the upper 0.99865 quantile of the range, which gives the same false alarm risk as a 3-sigma Shewhart chart. This IS computationally challenging for anything but an exponential distribution, though, because StatGraphics and Minitab don't do it. My book on SPC for non-normal distribution shows how to do the job with numerical integration in Visual Basic for Applications, and I was able to reproduce tabulated quantiles of ranges from a normal distribution.
These ranges are NOT normally distributed. For a sample of 4, the upper 0.99865 quantile of the range from a normal distribution is 5.20 and not 4.698 (the tabulated D2 factor) times the standard deviation. As the sample size increases, though, the range becomes more and more normally distributed.
As stated in the article, "The objective is not to compute the right number, nor to find the best estimate of the right value, nor to find limits that correspond to a specific alpha-level." The normal approximation is good enough in many situations, e.g. if the false alarm risk is really 0.00150 rather than 0.00135, this is not going to make a real difference on the shop floor. If we use D2 = 4.698 rather than 5.20 for a sample of 4, the higher false alarm risk is not likely to be a real (practical) problem.
If, on the other hand, the false alarm risk is 0.027 rather than 0.00135 (20 times the expected risk), and I can provide an example using the range of a sample of 4 from an exponential distribution, the production workers are going to wonder why they are chasing so many false alarms. This, at best, wastes their time (muda). Matters become worse if the false alarms result in overadjustment.
Again, my results depend on fitting the actual distribution, which is likely to be known based on the nature of the process, as opposed to artificial models that rely on skewness and kurtosis, or other approaches such as the Johnson distributions that, while they might provide a good fit for the data, don't have a real relationship to the underlying process. As an example, impurities (undesirable random arrivals) are likely to follow a gamma distribution. This can be confirmed with goodness of fit tests after one fits the distribution to the data.
All bets are, of course, off if the process is not predictable (under control) because, regardless of what distribution is correct for the data, the parameters are going to be based on bimodal (or even worse) data.
Normality and the Process Behaviour Chart
I would suggest that Dr Wheeler's little book "Normality and the Process Behaviour Chart" is far better value than wasting money on products like Minitab in a vain attempt to plot probability distributions and to attempt to normalize data.
Thanks ADB
Thanks for the recommendation.
It is mandatory to use the actual distribution for Ppk
http://www.qualitydigest.com/inside/six-sigma-column/making-decisions-n… discusses the central limit theorem. The truth is that, if you have a big enough sample, your sample averages will behave as if they come from a normal distribution regardless of the underlying distribution--even one as egregiously non-normal as the exponential distribution. Ranges might be another matter, but the distribution of ranges also becomes more normal as the sample size increases.
You must, however, use the actual underlying distribution to calculate the nonconforming fraction, and therefore the Ppk you quote to your customers. The Automotive Industry Action Group's SPC manual provides two ways to do this, but both require calculation of the quantiles from the underlying distribution. My preference is simply to calculate the fraction outside the specificatiohn limit, e.g. 1-F(UCL) where F is the cumulative distribution.
The other method cited by the AIAG is PP = (USL-LSL)/(Q(0.99865)-Q(0.00135)) where Q is the quantile of the underying distribution. If this distribution is normal, of course, the formula becomes the familiar (USL-LSL)/(6 sigma). (The problem is that this approach doesn't work for PPU and PPL, and therefore not for PPk.) If it is not normal, you must fit the underlying distribution to get Q(0.99865) and Q(0.00135).
Since you MUST fit your data to the underlying distribution to determine Ppk, you have already done the work necessary to set control limits for the actual distribution, which means there is no practical reason to not do it that way.
Another myth: the "actual" distribution
ADB's recommendation is a great one; that's a book that--frankly--if you haven't read it, you probably shouldn't enter into discussions on normality and the process behavior chart.
A more fundamental point, though, is the problem of trying to assume that there is an "actual distribution" and that you can use it to estimate with any precision the fraction non-conforming. While it is sometimes a useful exercise to estimate that fraction, it should always be presented with a lot of caveats; additionally, with every decimal point you get further from sound and reasonable conjecture in your prediction and closer to that cliff over which you fall into the land of pure and unadulterated fantasy.
To Tony's point, I have a package for Modeling and Simulation with very powerful curve-fitting engine. I can assess the goodness of fit of hundreds of distributions, comparing four different fit tests, resulting in parameter estimates out to 8 decimal points. While this is impressive (at least to me) and often useful in simulations, I wouldn't consider using it for SPC. I do, however, always try to use data that exhibit some reasonable evidence of a state of statistical control before I try to fit a distribution for further use as a modeling and simulation assumption. There can be no assumption of any distribution without a state of statistical control.
What Don has done with these articles is to provide us with a very reasonable and practical approach to adjusting the action limits when we know we have data that naturally appear skewed.
Non-normality of data and all WE Rules
Both Parts 1 and 2 of Dr. Wheeler explain that non-normality of data is not critical for the primary purpose of control charting: to decide when a process is out of control and take action about the behavior of the process
Normal
0
false
false
false
EN-US
X-NONE
HE
However, "Action" is also taken not only when a data point exceeds the 3 sigma limits (Detection Rule 1) but also when one of a Run Tests (Western Electric Rules) fails. To complete the picture, I would like Dr. Wheeler to also address how departure from normality affects the validity of the other Detection Rules 2, 3 and 4, which are also used to evaluate the begavior of a process.
Effect on Western Electric Rules
The Zone C test (8 consecutive points above or below the center line) relies on the assumption that the center line is the median (50:50 chance if the process is in control). This is true when the distribution is normal. It is emphatically not true when the distribution is non-normal.
The risks for Zones A and B also are calculated under the assumption of normality. I would expect the false alarm risks to be significantly higher when the distribution is skewed, although of course the problem won't be as bad for sample averages due to the central limit theorem.
My book on SPC for Real-World Applications uses as an example a gamma distribution for which the chance of exceeding the 3-sigma UCL is 0.01034, or more than seven times as great as what we expect (0.00135). This is not a show-stopper if you are not worried about false alarms, but it is definitely an issue when you calculate Ppk. This can be off by orders of magnitude in terms of the nonconforming fraction. You can, in fact, have a centered "Six Sigma" process that gives you 93 DPMO, or 93,000 times the expected one per billion.
WE rules on XmR charts of non-normal individual values
William!
Thank you for your response!
If the WE rules are based on the normal distribution, does this imply that the statement "Normality is not a pre-requisite for a process behavior chart" is valid for only WE Rule 1 (a data point outside the 3 sigma limits)? How would you then treat the other WE rules in control charts of individual values (XmR)?
X chart as worst case
Since the central limit theorem doesn't help for charts for individuals, the WE rules will work very poorly for non-normal systems. The ideal would be to set the zone limits at the quantiles of the actual distribution that correspond to the 1 and 2 sigma limits for a normal distribution.
If we had some actual data, I could show how this works.
Add new comment