# Square visualization of probabilities on two events: (example) Diseasitis

$$\newcommand{\bP}{\mathbb{P}}$$

From the Dise­a­sitis:

You are screen­ing a set of pa­tients for a dis­ease, which we’ll call Dise­a­sitis. Based on prior epi­demiol­ogy, you ex­pect that around 20% of the pa­tients in the screen­ing pop­u­la­tion will in fact have Dise­a­sitis. You are test­ing for the pres­ence of the dis­ease us­ing a tongue de­pres­sor con­tain­ing a chem­i­cal strip. Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black. How­ever, 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black. Among all the pa­tients with black tongue de­pres­sors, how many have Dise­a­sitis?

It seems like, since Dise­a­sitis very strongly pre­dicts that the pa­tient has a black tongue de­pres­sor, it should be the case that the con­di­tional prob­a­bil­ity $$\bP( \text{Diseasitis} \mid \text{black tongue depressor})$$ is big. But ac­tu­ally, it turns out that a pa­tient with a black tongue de­pres­sor is more likely than not to be com­pletely Dise­a­sitis-free.

Can we see this fact at a glance? Below, we’ll use the square vi­su­al­iza­tion of prob­a­bil­ities on two events to draw pic­tures and use our vi­sual in­tu­ition.

To in­tro­duce some no­ta­tion: our prior prob­a­bil­ity $$\bP(D)$$ that the pa­tient has Dise­a­sitis is $$0.2$$. We think that if the pa­tient is sick $$(D)$$, then it’s 90% likely that the tongue de­pres­sor will turn black $$(B)$$: we as­sign con­di­tional prob­a­bil­ity $$\bP(B \mid D) = 0.9$$. We as­sign con­di­tional prob­a­bil­ity $$\bP(B \mid \neg D) = 0.3$$ that the tongue de­pres­sor will be black even if the pa­tient isn’t sick. We want to know $$\bP(D \mid B)$$, the pos­te­rior prob­a­bil­ity that the pa­tient has Dise­a­sitis given that we’ve seen a black tongue de­pres­sor.

If we wanted to, we could solve this prob­lem pre­cisely us­ing Bayes’ rule:

\begin{align} \bP(D \mid B) &= \frac{\bP(B \mid D) \bP(D)}{\bP(B)}\\ &= \frac{0.9 \times 0.2}{ \bP(B, D) + \bP(B, \neg D)}\\ &= \frac{0.18}{ \bP(D)\bP(B \mid D) + \bP(\neg D)\bP(B \mid \neg D)}\\ &= \frac{0.18}{ 0.18 + 0.24}\\ &= \frac{0.18}{ 0.42} = \frac{3}{7} \approx 0.43\ . \end{align}

So even if we’ve seen a black tongue de­pres­sor, the pa­tient is more likely to be healthy than not: $$\bP(D \mid B) < \bP(\neg D \mid B) \approx 0.57$$.

Now, this calcu­la­tion might be en­light­en­ing if you are a real ex­pert at Bayes’ rule. A bet­ter calcu­la­tion would prob­a­bly be the odds ra­tio form of Bayes’s rule.

But ei­ther way, maybe there’s still an in­tu­ition say­ing that, come on, if the tongue de­pres­sor is such a strong in­di­ca­tor of Dise­a­sitis that $$\bP(B \mid D) = 0.9$$, it must be that $$\bP(D \mid B) =big$$.

Let’s use the square vi­su­al­iza­tion of prob­a­bil­ities to make it re­ally visi­bly ob­vi­ous that $$\bP(D \mid B) < \bP(\neg D \mid B)$$, and to figure out why $$\bP(B \mid D) = big$$ doesn’t im­ply $$\bP(D \mid B) =big$$.

We start with the prob­a­bil­ity of $$\bP(D)$$ (so we’re fac­tor­ing our prob­a­bil­ities by $$D$$ first): Now let’s break up the red column, where $$D$$ is true and the pa­tient has Dise­a­sitis, into a block for the prob­a­bil­ity $$\bP(B \mid D)$$ that $$B$$ is also true, and a block for the prob­a­bil­ity $$\bP(\neg B \mid D)$$ that $$B$$ is false.

Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black.

That is, in 90% of the out­comes where $$D$$ hap­pens, $$B$$ also hap­pens. So $$0.9$$ of the red column will be dark ($B$), and $$0.1$$ will be light: How­ever, 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black.

So we break up the blue $$\neg D$$ column by $$\bP(B \mid \neg D) = 0.3$$ and $$\bP(\neg B \mid \neg D) = 0.7$$: Now we would like to know the prob­a­bil­ity $$\bP(D \mid B)$$ of Dise­a­sitis once we’ve ob­served that the tongue de­pres­sor is black. Let’s break up our di­a­gram by whether or not $$B$$ hap­pens: Con­di­tion­ing on $$B$$ is like only look­ing at the part of our dis­tri­bu­tion where $$B$$ hap­pens. So the prob­a­bil­ity $$\bP(D \mid B)$$ of $$D$$ con­di­tioned on $$B$$ is the pro­por­tion of that area where $$D$$ also hap­pens: Here we can see why $$\bP(D \mid B)$$ isn’t all that big. It’s true that $$\bP(B,D)$$ is big rel­a­tive to $$\bP(\neg B,D)$$, since we know that $$\bP(B \mid D)$$ is big (pa­tients with Dise­a­sitis al­most always have black tongue de­pres­sors): But this ra­tio doesn’t re­ally mat­ter if we want to know $$\bP(D \mid B)$$, the prob­a­bil­ity that a pa­tient with a black tongue de­pres­sor has Dise­a­sitis. What mat­ters is that we also as­sign a rea­son­ably high prob­a­bil­ity $$\bP(B, \neg D)$$ to the pa­tient hav­ing a black tongue de­pres­sor but nev­er­the­less not suffer­ing from Dise­a­sitis: So even when we see a black tongue de­pres­sor, there’s still a pretty high chance the pa­tient is healthy any­way, and our pos­te­rior prob­a­bil­ity $$\bP(D\mid B)$$ is not that high. Re­call our square of prob­a­bil­ities: When asked about $$\bP(D\mid B)$$, we think of the re­ally high prob­a­bil­ity $$\bP(B\mid D) = 0.9$$: Really, we should look at the part of our prob­a­bil­ity mass where $$B$$ hap­pens, and see that a size­able por­tion goes to places where $$\neg D$$ hap­pens, and the pa­tient is healthy: ## Side note

The square vi­su­al­iza­tion is very similar to fre­quency di­a­grams, ex­cept we can just think in terms of prob­a­bil­ity mass rather than speci­fi­cally fre­quency. Also, see that page for wa­ter­fall di­a­grams, an­other way to vi­su­al­ize up­dat­ing prob­a­bil­ities.