Failure mode and effects analysis (FMEA) is an engineering tool that has been heavily adapted for use in Six Sigma programs where it is commonly used to decide which problem to work on. In this usage a risk priority number (RPN) is computed for each of several problems, and the problem with the largest RPN value is selected. The purpose of this column is to explain the inherent problems of RPN values.
Typically a list of several candidate problems will be rated on three scales: severity of failure (S), likelihood of occurrence (O), and difficulty of detection in advance (D). The problem will be assigned a rating of 1 to 10 on each scale, with 10 being severe, very likely to occur, and impossible to detect. These three ratings are then multiplied together to obtain a value known as a risk priority number, and these RPN values are then used to rank the problems. The idea is that the problem with the highest RPN value is the one that needs to be worked on first. This approach has been the subject of textbooks and has been used as the basis for several different types of voting and ranking schemes. Unfortunately, there are two major problems with the use of risk priority numbers.
…
Comments
(duplicate entry deleted)
(duplicate entry deleted)
An excellent analysis, as
An excellent analysis, as always Don. However I think that no amount of revealing the nonsense behind Six Sigma will make industry wake up and bring it out of the crisis.
My feeling is that rating "likelihood of occurrence" is quite unrealistic. Accidents for example, are commonly a combinatination of a series of "impossible" events, which seem to happen all too frequently.
FMEA as the lesser evil
Yes, ADB, the job of rating the likelihood of occurrence is very subjective. This is why I never use FMEAs. However, at the preproduction stage, when data are unavailable, it can be helpful to go through a FMEA to see if anything has been overlooked that needs to be addressed. It is always imperfect, but it beats the alternative of doing nothing.
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality
Change in RPN approach?
Advancing technology means we can improve on "the way we've always done it." As an example, traditional attribute charts are hopelessly obsolete when it is easy enought to compute exact control limits (0.00135 false alarm risk at each end) on a computer. I would say the same of the R and s chart because it is easy enough to calculate exact 0.00135 risk control limits (or any other specified risk) instead of relying on the normal approximation. The same might be possible with FMEA and PFMEA but it requires quantification of all three risks. For severity, what is the dollar cost of the failure (including intangibles such as enraged customers who never come back and recommend the same to their friends--the airline industry comes to mind immediately)? There are already guidelines for the probability of occurrence and probability of non-detection. Multiply them get the RPN: a direct measurement of the estimated (decision theory) cost of the failure in question. When the severity involves a threat to human life, perhaps set the cost at a very high "penalty cost" (similar to that for enforcing constraints in linear programming, in which the solution must be revised to eliminate "M" from the objective function). All this assumes, of course, that exact quantification is possible which is far easier in theory than in practice. If not, perhaps state an optimistic, most likely, and pessimistic RPN based on best and worst-case scenarios.
the danger in RPNs
Well said - great point about the problems using math on ordinal number; good suggestion in combining (vs. multipling) the digits to avoid the problems.
Ken B
Donald, do you have a
Donald, do you have a preferred risk analysis tool that you like to use better than FMEA? What you say makes a lot of sense, but as you know, leaders tend to want to see a ranking of some sort.
An alternative for DWILEN
The SOD code explained in the article will provide a ranking that can be explained and used in a rational manner.
I used only five levels with the SOD code because that will sort the problems into 125 categories, which is more than enough.
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality
Thanks for the article
Thanks for the article Donald, a mathematical analysis of FMEA. I too have been puzzled at how relatively low severity problems can merit the same RPN as potentially dangerous ones. My solution thus far has been to automatically prioritize high severity failure modes above low severity ones with comparable RPNs. However, now I am planning to try your approach.
High severity ratings
"I too have been puzzled at how relatively low severity problems can merit the same RPN as potentially dangerous ones. " I think books on FMEA say that anything with a high severity rating, especially at the danger to human life or safety level, automatically demands attention even if the occurrence and detection numbers are very low. The RPN is an aid to engineering judgment and not a substitute for it.
Alternatives
Excellent article, as usual. Working in product development, I find that FMEAs are often required by customers, especially the automotive OEMs, and better than nothing for capturing and communicating the ideas and knowledge that team members have about potential failures.
When we use FMEAs, I suggest to my teams that they follow the normal practice with regards to rankings and RPN, but then sort the RPNs into three classes based on the Severity ranking, where a 9 and 10 (causes injury or death) are in one class, 4 - 8 (likely to cause customer dissatisfaction) in another class and 1 - 3 (more or less undetectable to customer) in the third class. They then sort by RPN within the classes and address the most severe class first. I don't recall where I first heard of using classes to mitigate some of the problems of RPNs, but it seems to work reasonably well and still meet with most customer requirements for documentation. I think this would have nearly the same result as your SOD approach, but without the arguments with the customer over whether or not we should have 5 or 10 levels for each ranking.
The biggest problem with FMEAs, and alluded to in a previous comment, is that most failure modes are complex and not easily captured in the simplistic format of the FMEA. Overcoming this limitation while still meeting customer demands to have things in an FMEA format is my biggest challenge.
Sorting then using the RPN values
All RPN numbers are nonsense. The SOD code can be done with 10 categories (I would suggest using 0 through 9 rather than 1 through 10). When you sort by severity you are already doing the same thing as the first step in using the SOD code. The essence of FEMA is the three ordinal rankings, the RPN value is just a bit of nonsense tacked on for those who couldn't handle 1000 problem descriptions.
AIAG Manuals
Between the faulty percentages derived from the GRR studies in the MSA manual and the error of mutiplication used to calculate RPNs in the FMEA manual, it seems the AIAG books are doing more harm than good.
Does anyone know if the commitees that develop these procedures are aware of their short commings?
Rich
Agree and Disagree
Yes, the FMEA method is subjective, but so was the metric and
standard measurement system when first developed. Who decided an inch was an
inch? A consensus was made and then agreed upon. For any FMEA to work the
definitions and labels must be defined and used consistently. It can also be
used for more than the design phase too.
I use it to assign resources and monitor changes. Agreed that an
identical RPN Value may not be the same so you cannot ignore the variables
that were used to calculate it. The intent of the tool is not to look at
risk alone. The intent is to go get facts before you give the rating in each
category.
What I often see are groups who say that working at 1,000 feet has a
greater chance of injury than working at 10 feet. So by risk alone, they assign
the resources constantly for 1,000 feet tasks. Guess what? If the process for
controlling that risk is in control then you will have a low frequency rating.
While the inherent risk of working at 10 feet is lower but you have
60 incidents a month, where would you assign your process improvement resources?
By using a living FMEA you can set a baseline specific for the
process you are measuring, improve it and then reexamine the categories. I also
only use 1,3 and 9 as ratings.
So is it subjective..yes? Can it be used if used consistently with
defined parameters... yes!
There is not measurement (categorical or mathematical) that is 100 percent. That is why there are Type 1 and Type II errors and risks. Heck people can't even agree on .005, .001 or .01... does that mean you through it away.. no! Use it for its intended purpose, understand what was used to develop the measurements and do not abuse it.
Good luck
Christopher Vallee
chris@taproot.com
ps. great meeting you Don at the ASQ conference this year.
Good thinking if based on firm foundation
Only people in our line of work would likel say that was enjoyable read, but it certainly was. Such thresholds probably do contribute to a high rate of ineffective FMEA's, but they don't tell the whole story. People are loathe to score too low or too high, so while the sample set of possibilities favors the extremely low, the real world favors the 5th to 35th percentile of the range in my experience. That seems to be governed by the team's tendency to central scoring. A very human influence.
I admire the attempt to steer the ship toward something meaningful, but also challenge the (in my mind) excessive debate, over the scoring of FMEA's. if we're going to exert mental energy, then the lion's share of it needs to be in the direction of maximum effectiveness. Is the proper method of scoring improperly specified risks truly effective? In our constant debate of scoring FMEA, we too often miss the point of the excercise to begin with: to diligently think about our risks and what we're going to do about them.
It's odd that a tool whose origins reach back nearly 70 years now is still so misunderstood and so inconsistently employed, but here we stand with it as such. And it alarms me that we so often get hung up on the nuances of scoring while missing the meat on the bone of definition of failure effect, failure mode, root cause and accurately categorized controls. Without diligence to those items first, any scoring and any way of slicing that scoring will yield nonsense.
Most CQE's are now trained to consider RPN's in a method similar to what you've laid out, but the attention to defining the failure mode, what prevention controls are, what detection controls are, and how the categories do or do not interact seems to be lacking.
Look at some FMEA's you've seen recently with a probing eye. Is the failure mode well defined and connected to the requirements in a way that makes it actionable? Are root causes relevant to the process at hand? Are controls appropriately categorized? It can be alarming to see how often we fall short there, but encouraging to know that we're only a simple turn in thinking away from truly honoring the intent of the tool.
Maybe I'm a hopeless optimist, but I'd like to think that if we really do our best to state what our risks are and what causes them, that nonsensical scores expose themselves, and the real work that needs done will get done.
SOD
Agree that SOD can signify the ordering of three aspects (severity, occurrence, and detection). However, it also has a disadvantage when using a large FMEA. If we need to prioritize the risk base on SOD, it could be misleading. Example: 799 or 811, which one is the higher risk? In my view, RPN should be used with a risk matrix (priority matrix) to define the higher risk (www.fmea-analysis.com/news/risk-priority-number or www.iqasystem.com/news/risk-priority-number)
Add new comment