The symptoms of leptokurtophobia are (1) routinely asking if your data are normally distributed and (2) transforming your data to make them appear to be less leptokurtic and more “mound shaped.” If you have exhibited either of these symptoms then you need to read this article.
The origins of leptokurtophobia go back to the surge in statistical process control (SPC) training in the 1980s. Before this surge only two universities in the United States were teaching SPC, and only a handful of instructors had any experience with SPC. As a result many of the SPC instructors of the 1980s were, of necessity, neophytes, and many things that were taught at that time can only be classified as superstitious nonsense. One of these erroneous ideas was that you must have normally distributed data before you can put your data on a process behavior chart (also known as a control chart).
…
Comments
Leptokurtophobia
I, too, find a huge obsession with others (including colleagues) to check for normality first and foremost or look for transformations. If n is sufficiently large, CLT kicks in and it doesn't matter. If the application is SPC, then as Wheeler suggests, it generally doesn't matter unless n is 1 or 2 and the skew is quite significant. To me, what is most important is looking at the data to validate homogeneity (and as Wheeler suggests) - that will burn you more often than anything else. My approach is generally first to look at probability plots - not to validate normality but to look for tell-tale signs of outliers or several modes. After that, I create SPC and/or time series plots to validate homogeneity. It's been almost two decades since I really cared about normality. I will remember the name of the disease!
Lepto...by any other name
Here's a related article from several years ago, with a different spelling for the same concept (and a more technical response): http://www.jmp.com/about/newsletters/jmpercable/pdf/15_summer_2004.pdf
Great Article - is there an error?
I see an error when I ran the numbers. I ran the 141 observations for the hot metal tranist times and get 59.9 for the average and putting these into the individuals chart in Minitab yields 3 points out of control, not 11 points.
But, Love the article! Great lesson here!
Absolutely superb article
I have been following Dr. Wheeler for years, since the early days of SPC Ink. I am forever in his debt for 'saving' me from learning the superstitions that are so prevalent in our academia regarding process behavior charts...I must have 20 SPC Ink articles in a ragged old blue binder (sitting in front of me now), in addition to at least three of his books, on this amazing subject.
His recent publications on six-sigma, in which he systematically reveals the fallacy of the gigantic leap of faith upon which the whole concept originated saved me once again from spending hard to come by money on a 'belt' - and instead focus on the higher ROI PPI program that he and Ed Zunich so carefully designed.
Many of you reading this latest article probably have no idea how markedly important and useful it really is.
Dr. Wheeler, thanks for this valuable gift - for those who understand it as such, anyway!
- forever, A Student
False Alarm rates are not comparable for skewed distribution
The idea that the false alarm rates are comparable for the various distributions shown in figure 1 is not correct. Anyone who has ever made an Individuals control chart with highly skewed data knows that the false alarm rate can be quite high. The reason is that the data in the figure have been standardized based on the overall standard deviation, while an Individuals control chart typically uses the moving range as an estimate of variation to determine control limits.
With highly skewed data such as that shown in the bottom two distributions, the data is more bunched up and the average moving range is smaller than that of the normal distribution - therefore, although both may have the same overall standard deviation, the control limits for the skewed distribution will be tighter and therefore we expect a larger false alarm rate. A quick simulation using 10,000 data points that are normal and 10,000 which are skewed - both standardized to a mean of 0 and overall standard deviation of 1 - shows a false alarm rate of .0021 (0.21%) for the normal data and .026 (2.6%) for the skewed data. I don't think most practitioners want a 12-fold increase in their false alarm rate, especially given the resources that frequently go into finding out the "special cause" of an out-of-control point.
While I agree transformations may be overused or frequently used in cases in which they are unnecessary, I have to also side with those that have a good transformation and choose to use it when establishing control using an Individuals control chart and skewed data.
Add new comment