Some commonly held ideas about skewed probability models are incorrect. These incorrect ideas are one source of complexity and confusion regarding the analysis of data. By examining the basic properties of skewed distributions this article can help you to greater clarity of thought and may even simplify your next data analysis.
ADVERTISEMENT |
How would you characterize a skewed distribution? When asked this question most will answer, “A skewed distribution is one that has a heavy, elongated tail.” This idea is expressed by saying that a distribution becomes more heavy-tailed as its skewness and kurtosis increase. To examine these ideas we shall use a popular family of skewed distributions, the Weibulls.
The Weibull family of distributions
Weibull distributions are widely used in reliability theory and are generally found in most statistical software packages. This makes these distributions easy to use without having to work with complicated equations. The following equations are included here in the interest of clarity. The Weibull distributions depend upon two parameters: alpha, α, and beta, β. The cumulative distribution function for the Weibull family has the form:
…
Comments
This brings up the issue of frequency in the context of risk
"Moreover, to avoid missing real signals within the experimental data, it is traditional to filter out only 95 percent of the probable noise." This will actually help me illustrate the issue of frequency, or exposure, in the context of risk-based thinking for ISO 9001:2015.
Frequency of exposure is not from standard FMEA practice, in which we consider only the individual probability of occurrence, but from the Army's risk management process (ATP 5-19)--a public domain document that easily meets the requirements of ISO 31000, which is rather expensive.
"Probability is assessed as frequent if a harmful occurrence is known to happen continuously, regularly, or inevitably because of exposure. Exposure is the frequency and length of time personnel and equipment are subjected to a hazard or hazards. For example, given about 500 exposures, without proper controls, a harmful event will occur. Increased exposure—during a certain activity or over iterations of the activity—increases risk. An example of frequent occurrence is a heat injury during a battalionphysical training run, with a category 5 heat index and nonacclimated Soldiers."
This is something traditional FMEA does NOT consider.
In the case of DOE, we have a traditional 5% chance of wrongly rejecting the null hypothesis, but we are exposed to this risk only once because the experiment is a one-time event. In SPC, however, we are exposed to our false alarm risk every time we take a sample. A 5% alpha risk that is acceptable for a one-time experiment is definitely not acceptable for process management. Even a 2% risk will give us, on average, one false alarm per 50 occurrences. 0.27% gives us, on average, 2.7 per 1000 samples (but more if we throw in the Western Electric zone tests). The frequency of exposure issue makes a 5% Type I risk acceptable for most DOE applications, but not for SPC where we are exposed to the risk hundreds or thousands of times.
As for actually fitting a Weibull (or gamma) distribution, I would not use the average and standard deviation even though one can estimate the parameters this way. The maximum likelihood method, which is used by Minitab and StatGraphics, is much better. In addition, with regard to the long tails of the Weibull distribution, you can get a 95% confidence limit on the nonconforming fraction, which makes the calculations meaningful despite the uncertainty in the data. We have the same issue, by the way, with process performance indices for normal distributions, in which our "Six Sigma" process could be as little as four sigma if we don't have enough measurements. One can similarly get lower confidence limits for PP (chi square distribution for the confidence limits for the process standard deviation), PPU and PPL (noncentral t distribution, foundation for the tolerance interval), and PPk (somewhat harder).
The bottom line is that, if we have enough data, we can get meaningful confidence limits on the nonconforming fraction from a normal or non-normal distribution. If we don't have enough data, we cannot get meaningful estimates of PP or PPk from any distribution.
"exposure" and risk
While I agree with the premise that even a small rate of occurence for a large numbe rof opportunities results in a large numebr of events AND our tolerance for defects has narrowed over the years I'm nto sure the analogy applies to SPC. One can make the argument about the risk of a false alarm for a single sample, but SPC cocnerns itself with time series data and the use of additional rules and the persistance of a shift will address the possibility of a 'single' false alarm. This provides us far more protection than any increase in the precision of a distributional model to real world data.
Real World
Meanwhile, out in the land of business, where it seems impossible to make the basics simple enough ... client: " " ... we are only interested in the results, not the Six Sigma math." Consultant: "I take them through Shewharts Control Charts, +/- 3 Std Dev and why the +/- 1.5 process shift allowance is such nonsense - and it just gets blank looks" '
Six Sigma math
I can't see how a 1.5 sigma shift could last very long noting that, for an x-bar chart with a sample size of 4, the average run length is 2. That is, the UCL is actually 3/SQRT(4) or 1.5 sigma from the center line, so you have a 50:50 chance of being outside it for any sample.
1.5 sigma shift and the Yeti
I don't know why this is being discussed here. Dr. Wheeler did not mention this in his article...
Add new comment