Simpson’s Paradox is the name of the occurrence when trends appearing in a dataset that is separated into groups reverses when the data are aggregated. The Paradox is named after E.H. Simpson who used witty and surprising illustrations to draw attention to some of the problems that arise from connections between proportions, percentages, probabilities and their representations as fractions. Simpson’s seminal paper on the subject was published in 1951.
The following example, based on a discrimination suit brought against the University of California, Berkeley, was used by the Stanford Encyclopedia of Philosophy to illustrate Simpson’s Paradox:
Imagine a hiring process at a University where the Department of History gets 5 male applicants of which 1 gets hired and 8 female applicants of which 2 get hired. The success rate form men is 20% and for women is 25% so the data show that the History Department favored women over men. In the Geography Department 8 men apply and 6 are hired and 5 women apply and 4 are hired, giving a success rate of 75% for men and 80% for women, showing that this department favored women over men. However, aggregating the data, the University had 13 men and 13 women apply for jobs and 7 men and 6 women were hired, showing that the success rate for male applicants was greater than for female applicants. This paradox occurs due to a lurking variable. It is harder to get a job in History compared with Geography, and more women are applying for jobs in History. History hired only 3 out of 13 applicants and Geography hired 10 out of 13 applicants.
The above scenario follows Simpson’s Reversal of Inequalities, which can be described as follows:
a/b < A/B,
c/d < C/D, and
(a + c)/(b + d) > (A + C)/(B + D)
1/5 < 2/8
6/8 < 4/5
7/13 > 6/13
Simpson’s paradox is used both to demonstrate the need for statistics education in school but also to demonstrate the limits of statistical methods and why causal considerations are necessary to avoid paradoxical conclusions. On its own, Simpson’s reversal is an arithmetic phenomenon in the calculus of proportions. According to Judea Pearl, University of California Professor, the reason this phenomenon has been regarded as paradoxical and fascinated statisticians, mathematicians and philosophers is that it clashes with deeply held convictions that the peculiarity is impossible. Pearl writes, “it is hard, if not impossible, to explain the surprise part of Simpson’s reversal without postulating that human intuition is governed by causal calculus together with a persistent tendency to attribute causal interpretation to statistical associations.”
Simpson's Paradox: How to Prove Opposite Arguments with the Same Data
October 12, 2018