Square visualization of probabilities on two events: (example) Diseasitis

$$ \newcommand{\bP}{\mathbb{P}} $$

From the Dise­a­sitis:

You are screen­ing a set of pa­tients for a dis­ease, which we’ll call Dise­a­sitis. Based on prior epi­demiol­ogy, you ex­pect that around 20% of the pa­tients in the screen­ing pop­u­la­tion will in fact have Dise­a­sitis. You are test­ing for the pres­ence of the dis­ease us­ing a tongue de­pres­sor con­tain­ing a chem­i­cal strip. Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black. How­ever, 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black. Among all the pa­tients with black tongue de­pres­sors, how many have Dise­a­sitis?

It seems like, since Dise­a­sitis very strongly pre­dicts that the pa­tient has a black tongue de­pres­sor, it should be the case that the con­di­tional prob­a­bil­ity \(\bP( \text{Diseasitis} \mid \text{black tongue depressor})\) is big. But ac­tu­ally, it turns out that a pa­tient with a black tongue de­pres­sor is more likely than not to be com­pletely Dise­a­sitis-free.

Can we see this fact at a glance? Below, we’ll use the square vi­su­al­iza­tion of prob­a­bil­ities on two events to draw pic­tures and use our vi­sual in­tu­ition.

To in­tro­duce some no­ta­tion: our prior prob­a­bil­ity \(\bP(D)\) that the pa­tient has Dise­a­sitis is \(0.2\). We think that if the pa­tient is sick \((D)\), then it’s 90% likely that the tongue de­pres­sor will turn black \((B)\): we as­sign con­di­tional prob­a­bil­ity \(\bP(B \mid D) = 0.9\). We as­sign con­di­tional prob­a­bil­ity \(\bP(B \mid \neg D) = 0.3\) that the tongue de­pres­sor will be black even if the pa­tient isn’t sick. We want to know \(\bP(D \mid B)\), the pos­te­rior prob­a­bil­ity that the pa­tient has Dise­a­sitis given that we’ve seen a black tongue de­pres­sor.

If we wanted to, we could solve this prob­lem pre­cisely us­ing Bayes’ rule:

$$ \begin{align} \bP(D \mid B) &= \frac{\bP(B \mid D) \bP(D)}{\bP(B)}\\ &= \frac{0.9 \times 0.2}{ \bP(B, D) + \bP(B, \neg D)}\\ &= \frac{0.18}{ \bP(D)\bP(B \mid D) + \bP(\neg D)\bP(B \mid \neg D)}\\ &= \frac{0.18}{ 0.18 + 0.24}\\ &= \frac{0.18}{ 0.42} = \frac{3}{7} \approx 0.43\ . \end{align} $$

So even if we’ve seen a black tongue de­pres­sor, the pa­tient is more likely to be healthy than not: \(\bP(D \mid B) < \bP(\neg D \mid B) \approx 0.57\).

Now, this calcu­la­tion might be en­light­en­ing if you are a real ex­pert at Bayes’ rule. A bet­ter calcu­la­tion would prob­a­bly be the odds ra­tio form of Bayes’s rule.

But ei­ther way, maybe there’s still an in­tu­ition say­ing that, come on, if the tongue de­pres­sor is such a strong in­di­ca­tor of Dise­a­sitis that \(\bP(B \mid D) = 0.9\), it must be that \(\bP(D \mid B) =big\).

Let’s use the square vi­su­al­iza­tion of prob­a­bil­ities to make it re­ally visi­bly ob­vi­ous that \(\bP(D \mid B) < \bP(\neg D \mid B)\), and to figure out why \(\bP(B \mid D) = big\) doesn’t im­ply \(\bP(D \mid B) =big\).

We start with the prob­a­bil­ity of \(\bP(D)\) (so we’re fac­tor­ing our prob­a­bil­ities by \(D\) first):

Now let’s break up the red column, where \(D\) is true and the pa­tient has Dise­a­sitis, into a block for the prob­a­bil­ity \(\bP(B \mid D)\) that \(B\) is also true, and a block for the prob­a­bil­ity \(\bP(\neg B \mid D)\) that \(B\) is false.

Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black.

That is, in 90% of the out­comes where \(D\) hap­pens, \(B\) also hap­pens. So \(0.9\) of the red column will be dark (\(B\)), and \(0.1\) will be light:

How­ever, 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black.

So we break up the blue \(\neg D\) column by \(\bP(B \mid \neg D) = 0.3\) and \(\bP(\neg B \mid \neg D) = 0.7\):

Now we would like to know the prob­a­bil­ity \(\bP(D \mid B)\) of Dise­a­sitis once we’ve ob­served that the tongue de­pres­sor is black. Let’s break up our di­a­gram by whether or not \(B\) hap­pens:

Con­di­tion­ing on \(B\) is like only look­ing at the part of our dis­tri­bu­tion where \(B\) hap­pens. So the prob­a­bil­ity \(\bP(D \mid B)\) of \(D\) con­di­tioned on \(B\) is the pro­por­tion of that area where \(D\) also hap­pens:

Here we can see why \(\bP(D \mid B)\) isn’t all that big. It’s true that \(\bP(B,D)\) is big rel­a­tive to \(\bP(\neg B,D)\), since we know that \(\bP(B \mid D)\) is big (pa­tients with Dise­a­sitis al­most always have black tongue de­pres­sors):

But this ra­tio doesn’t re­ally mat­ter if we want to know \(\bP(D \mid B)\), the prob­a­bil­ity that a pa­tient with a black tongue de­pres­sor has Dise­a­sitis. What mat­ters is that we also as­sign a rea­son­ably high prob­a­bil­ity \(\bP(B, \neg D)\) to the pa­tient hav­ing a black tongue de­pres­sor but nev­er­the­less not suffer­ing from Dise­a­sitis:

So even when we see a black tongue de­pres­sor, there’s still a pretty high chance the pa­tient is healthy any­way, and our pos­te­rior prob­a­bil­ity \(\bP(D\mid B)\) is not that high. Re­call our square of prob­a­bil­ities:

When asked about \(\bP(D\mid B)\), we think of the re­ally high prob­a­bil­ity \(\bP(B\mid D) = 0.9\):

Really, we should look at the part of our prob­a­bil­ity mass where \(B\) hap­pens, and see that a size­able por­tion goes to places where \(\neg D\) hap­pens, and the pa­tient is healthy:

Side note

The square vi­su­al­iza­tion is very similar to fre­quency di­a­grams, ex­cept we can just think in terms of prob­a­bil­ity mass rather than speci­fi­cally fre­quency. Also, see that page for wa­ter­fall di­a­grams, an­other way to vi­su­al­ize up­dat­ing prob­a­bil­ities.

Parents: