In the January 2010 issue of MoreNews, we discussed “Simpson’s Paradox,” a well-known phenomenon that can distort causal relationships in data sets in the presence of a confounder or covariate. In this article, we will talk about some practical ways to guard against becoming the victim of this insidious effect.
ADVERTISEMENT |
Simpson’s Paradox is the name given to the phenomenon where the direction of an effect is reversed when you take into account a previously ignored (lurking) variable that significantly affects the relationship.
An example of the paradox
Let’s elaborate on the definition with an example. You’re in charge of a study that compares how two weight-loss techniques—diet and exercise—affect the weight loss of overweight patients. Overall, you had 240 patients participate in the study, with 120 assigned to a weight-loss diet and the remaining 120 assigned to a supervised exercise regimen.
…
Comments
Yes but not statistical significant
Confounding is an important concept so thanks for the article. I'm not sure how you got to the conclusion that 58% is "significantly different" from 48%. Using a chi-square test to see if these two proportions were significantly different with 120 subjects in each group results in a p-value of 0.093 (a p-value of less than 0.05 is consider significant). If we double the subjects to 240 in each group (keep the weight loss proportions the same) then the p-value is 0.017. Proportions may look significantly different but it all depends on the number of the denominator. A way to see if confounding was an issue is to use a logistic regression model. Running the logistic regression model with just weight loss (dependent variable as yes or no; coded 1 or 0) and the group (diet or exercisers; coded 1 or 0) similar to the chi-square test, you get a significant p-value of 0.018 (with 240 subjects in each group). When you run the model again including the additional variable of BMI, the p-value for the group variable is not significant - it is 0.069. The model adjusts for the inequality of BMI and shows that BMI was a confounder and the diet and exercise groups really lost weight at similar rates.
How to display Simpson's Paradox graphically?
I would recommend the graph shown by Howard Wainer in Chapter 10 Two Mind-Bending Statistical Paradoxes of the book Graphic Discovery. Since the graph can't be pasted here, I will describe the construction.
It is line graph with Y axis = % of losing weight and X = proportion of "BMI >40" (or X = proportion of "30 < BMI < 40")
First line: "Exercise" with end points: (0%, 27.5%), (100%, 87%)
Second line: "Diet" with end points: (0%, 25%), (100%, 75%)
The graph has the appearance of 2-factor interaction plot in DOE.
Overlaying on these 2 lines are two points:
(1) on the Exercise line: (33%, 48%) - 48% is the result of the study group consistsing of 33% (=40/120) of "BMI > 40"
(2) on the Diet line: (67%, 58%) - 58% is the result of the study group consisting of 67% (= 80/120) of "BMI > 40"
Now the mystery is clear. Although Exercise line is clearly above the Diet line, results of the 2 studies showed Diet is better because the group involved in the Diet study had more "BMI >40". It is also clear that the 48% of the Exercise result is just the weighted average of 27.5% and 87%.
Add new comment