There are two key aspects of the normal distribution that make it the central probability model in statistics. However, students seldom hear about these important aspects, and as a result they end up making many unnecessary mistakes. Read on to learn what it means when we say the normal distribution has maximum uncertainty.
ADVERTISEMENT |
The normal distribution has long been known to be the distribution with maximum entropy, but like many things in statistics, this mathematical fact does not translate into understandable properties. The concept of entropy is a measure of uncertainty for a probability model that comes from information theory (those who are interested can find the definition of continuous entropy on Wikipedia). Therefore, maximum entropy is equivalent to maximum uncertainty. But just what does this mean?
…
Comments
Postscript
Re the postscript
Your statement in the postscript is too strong; exceptions are easy to find. You have a kind of caveat into the original article. While I think the original caveat is too weak (there's an infinite number of exceptions, so to make some claim about relative preponderance of distributions that meet or fail the claim we'd need some probability-distribution over the space of distributions considered); that aside, it's certainly good to have noted that it's not always true, but you can't drop it in the postscript. [It might be instructive to show some of the exceptions. Among continuous symmetric unimodal distributions, the largest proportion outside k standard deviations from the mean is 4/9(k^2). For k=1.70 that's about 15.4%; it's interesting that the normal does get up as high as it does for k in that region.]
Great article!
At the risk of sounding like a teen-ager--O M G!! Fabulous article. It has really gotten me thinking about all of the stuff I learned in statistics and raised a lot of questions about the (standard) uses of other distributions. For example, should we ever use a Student's t test? Or a chi-square test? I think I know what you would say about some of them and it is pretty much a repeat of what you have said here regarding the use of process performance charts, but I would really love to see more discussions of the implications of this concept. In fact, I am now wondering if, looking at the entire field of statistics, including analysis of designed experiments-which has become such a large part of the Six Sigma methodology-we aren't making the wrong assumptions more often than not. Perhaps this is too esoteric a discussion for the Quality Digest audience, but definitely of interest to statistical practitioners everywhere. Am I completely ovethinking this, or could the implications of this totally revamp the application of statical methods?
T-tests, etc.
T distribution
Now, I'm somewhat notorious for missing horribly obvious things, but I thought that the t distribution basically started at 1 df with egregiously heavy tails and the more degrees of freedom you add, the closer it approximates the normal distribution. How is it that your 6 df t distribution has smaller tails than your normal distribution? When I run the calculations for a 6 df t distribution, I get a tail area of 14.9% at plus/minus 1.656 standard deviations. Am I missing something again?
Heavy tailed t-dists.
Overwhelming evidence
I always find Don's papers a fantastic read. His papers and excellent books should provide overwhelming evidence that Shewhart's approach was right. Yet teaching to the contrary continues in Six Sigma courses, in a fashion that Deming described as "seeing every day the devastating effects of incompetent teaching and faulty application" (p131 Out of the Crisis). Despite the good statistics from Don, Deming and Shewhart, the Asch Effect prevails, where almost the entire industry follows the ridiculous Six Sigma path, often even with an awareness of its fallacies. (Solomon Asch and Conformity Studies: http://psychology.about.com/od/classicpsychologystudies/p/conformity.htm )
Tail heaviness
Hi Don, I don't agree, and I don't believe it is generally agreeable, that you can define "tail heaviness" as "probability outside a central range." Tail heaviness is commonly thought of a the potential to generate extreme observations. A counteraxample where the probability concentration outside the central range goes to zero, yet the distribution is heavier- and heavier-tailed, in the sense of having the potential to produce extreme outliers, is given here: https://math.stackexchange.com/questions/167656/fat-tail-large-kurtosis…
Add new comment