What is Simpson's Paradox?

Okay, let's talk about this very interesting topic.

What is Simpson's Paradox?

Simply put, Simpson's Paradox refers to the phenomenon where, when observing data grouped into categories, each group individually shows a certain trend, but when these groups are combined, that trend disappears, or even reverses entirely.

Sounds a bit convoluted? Don't worry, it's far more common than it sounds. It's like a data magic trick where you see the truth of each individual part, but the combined "truth" deceives you.

A Classic Example: Kidney Stone Treatment

Imagine two hospitals (or two treatment methods), let's call them Treatment A and Treatment B, both treating kidney stones. We collected their cure rates and obtained the following aggregated data:

Treatment	Total Patients	Cured Patients	Overall Cure Rate
Treatment A	350	273	78%
Treatment B	350	289	83%

Looking only at this summary table, you would certainly conclude: Treatment B is better! Its cure rate (83%) is significantly higher than Treatment A's (78%).

But what if we break down the data?

Kidney stones vary in size, and their treatment difficulty differs accordingly. We divide the patients into two groups: "small stones" and "large stones," and then examine the cure rates again.

1. Small Stone Group

Treatment	Total Patients	Cured Patients	Cure Rate
Treatment A	87	81	93%
Treatment B	270	234	87%

See, something amazing has happened! When treating "small stones," which are simpler cases, Treatment A (93%) is actually much more effective than Treatment B (87%).

2. Large Stone Group

Treatment	Total Patients	Cured Patients	Cure Rate
Treatment A	263	192	73%
Treatment B	80	55	69%

When treating "large stones," which are more difficult cases, Treatment A (73%) is still more effective than Treatment B (69%).

Where Does the Paradox Lie? — The Lurking Variable

Now, here's the problem:

Treatment A is better than B when treating small stones.
Treatment A is also better than B when treating large stones.
But why is Treatment B's overall cure rate higher when the data is combined?

This is the core of Simpson's Paradox. The reason lies in the imbalanced distribution of data among groups, with a "lurking variable" at play.

In this example, the lurking variable is "stone size" (or rather, "case difficulty").

Let's look at the patient distribution in the original data:

Treatment A: Took on a large number of patients with large stones (263 people), cases that inherently have a lower cure rate. It only took on a small number of patients with small stones (87 people).
Treatment B: Primarily treated patients with small stones (270 people), cases that inherently have a higher cure rate. It only took on a small number of patients with large stones (80 people).

Simply put, Treatment A took on mostly "tough cases," while Treatment B mostly handled "easy tasks."

This is like a top basketball team (Treatment A) playing against strong opponents most of the time, while an average team (Treatment B) mostly plays against weaker opponents. At the end of the season, if you only look at the overall win rate, it's highly probable that the average team would have a higher percentage. But can you say the average team is stronger than the top team? Obviously not.

Treatment B's high overall success rate is largely because it treated more "easy-to-succeed" patients, not because it was inherently more effective. When the factor of "case difficulty" is hidden, and we only look at the aggregate data, we draw an incorrect conclusion.

What Can We Learn? How to Avoid Falling into the Trap?

The biggest lesson Simpson's Paradox teaches us is: what you see isn't always what's real, especially when looking at statistical data.

Don't blindly trust aggregate data: When you see summarized data, ask yourself: "Can this data be further broken down? Are there different populations or different circumstances at play?"
Look for lurking variables: When analyzing data, use common sense and domain knowledge to consider if there might be an overlooked but crucial factor influencing the results. For example, when analyzing admission rates for different majors, consider applicants' average grades; when analyzing drug efficacy, consider patients' age, severity of condition, etc.
Comparison by groups is key: When comparing two things (e.g., two methods, two groups), ensure you are comparing "like with like." Compare "apples" and "oranges" separately, rather than mixing them and just looking at the total.

In conclusion, Simpson's Paradox reminds us that data analysis is not just about calculation; it's about insight. Next time you encounter a surprising statistical conclusion, allow the dust to settle, and consider whether there might be another story hidden behind the data.