A conversation the other day involved how or why someone would use the mean of a set of data described by a Weibull distribution.
ADVERTISEMENT |
The Weibull distribution is great at describing a dataset that has a decreasing or increasing hazard rate over time. Using the distribution we also do not need to determine the mean time between failures (MTBF)—which is not all that useful, of course.
Walking up the stairs today, I wondered if the arithmetic mean of the time-to-failure data, commonly used to estimate MTBF, is the same as the mean of the Weibull distribution. Doesn’t everyone think about such things?
So, I thought, I’d check. Set up some data with an increasing failure rate, and calculate the arithmetic mean and the Weibull distribution mean.
The data set
I opened R and using the random number-generating function, rweibull, created 50 data points from a Weibull distribution with a shape (β) of 7 and scale (η) of 1,000.
Here’s a histogram of the data.
…
Comments
Your MTBF article
Choosing beta = 7 makes the distribution look more gaussian. Suppose you were gathering failure data on operations to compare with what vendors claim. In the case of servers, the vendor might claim to use an exponential distribution with a given MTBF. There is a difference between their number and what you see in the operational environment. Next suppose that your operaional environment has 2000 servers. How do you propose determining the MTBF?
Using a Weibull distribution seems to make sense to me, and I expect the result to be significantly different from the arithmetic mean for a couple of reasons. My sample size is less than the 50,000 that you experimented with, and my beta is significantly lower than 7.
OK, so what is the value of the article beyond provoking comments?
Add new comment