# Square visualization of probabilities on two events

$$\newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} \newcommand{\bP}{\mathbb{P}}$$

Say we have two events, $$A$$ and $$B$$, and a prob­a­bil­ity dis­tri­bu­tion $$\bP$$ over whether or not they hap­pen. We can rep­re­sent $$\bP$$ as a square:

So for ex­am­ple, the prob­a­bil­ity $$\bP(A,B)$$ of both $$A$$ and $$B$$ oc­cur­ring is the ra­tio of [the area of the dark red re­gion] to [the area of the en­tire square]:

Vi­su­al­iz­ing prob­a­bil­ities in a square is neat be­cause we can draw sim­ple pic­tures that high­light in­ter­est­ing facts about our prob­a­bil­ity dis­tri­bu­tion.

Below are some pic­tures illus­trat­ing:

• in­de­pen­dent events (What hap­pens if the columns and the rows in our square both line up?)

• marginal prob­a­bil­ities (If we’re look­ing at a square of prob­a­bil­ities, where’s the prob­a­bil­ity $$\bP(A)$$ of $$A$$ or the prob­a­bil­ity $$\bP(\neg B)$$?)

• con­di­tional prob­a­bil­ities (Can we find in the square the prob­a­bil­ity $$\bP(B \mid A)$$ of $$B$$ if we con­di­tion on see­ing $$A$$? What about the con­di­tional prob­a­bil­ity $$\bP(A \mid B)$$?)

• fac­tor­ing a dis­tri­bu­tion (Can we always write $$\bP$$ as a square? Why do the columns line up but not the rows?)

• the pro­cess of com­put­ing joint prob­a­bil­ities from fac­tored probabilities

# In­de­pen­dent events

Here’s a pic­ture of the joint dis­tri­bu­tion of two in­de­pen­dent events $$A$$ and $$B$$:

Now the rows for $$\bP(B)$$ and $$\bP(\neg B)$$ line up across the two columns. This is be­cause $$\bP(B \mid A) = \bP(B) = \bP(B \mid \neg A)$$. When $$A$$ and $$B$$ are in­de­pen­dent, up­dat­ing on $$A$$ or $$\neg A$$ doesn’t change the prob­a­bil­ity of $$B$$.

For more on this vi­su­al­iza­tion of in­de­pen­dent events, see the aptly named Two in­de­pen­dent events: Square vi­su­al­iza­tion.

# Marginal probabilities

We can see the marginal prob­a­bil­ities of $$A$$ and $$B$$ by look­ing at some of the blocks in our square. For ex­am­ple, to find the prob­a­bil­ity $$\bP(\neg A)$$ that $$A$$ doesn’t oc­cur, we just need to add up all the blocks where $$\neg A$$ hap­pens: $$\bP(\neg A) = \bP(\neg A, B) + \bP(\neg A, \neg B)$$.

Here’s the prob­a­bil­ity $$\bP(A)$$ of $$A$$, and the prob­a­bil­ity $$\bP(\neg A)$$ of $$\neg A$$:

Here’s the prob­a­bil­ity $$\bP(\neg B)$$ of $$\neg B$$:

In these pic­tures we’re di­vid­ing by the area of the whole square. Since the prob­a­bil­ity of any­thing at all hap­pen­ing is 1, we could just leave it out, but it’ll be helpful for com­par­i­son while we think about con­di­tion­als next.

# Con­di­tional probabilities

We can start with some prob­a­bil­ity $$\bP(B)$$, and then as­sume that $$A$$ is true to get a con­di­tional prob­a­bil­ity $$\bP(B \mid A)$$ of $$B$$. Con­di­tion­ing on $$A$$ be­ing true is like re­strict­ing our whole at­ten­tion to just the pos­si­ble wor­lds where $$A$$ hap­pens:

Then the con­di­tional prob­a­bil­ity of $$B$$ given $$A$$ is the pro­por­tion of these $$A$$ wor­lds where $$B$$ also hap­pens:

If in­stead we con­di­tion on $$\neg A$$, we get:

So our square vi­su­al­iza­tion gives a nice way to see, at a glance, the con­di­tional prob­a­bil­ities of $$B$$ given $$A$$ or given $$\neg A$$:

We don’t get such nice pic­tures for $$\bP(A \mid B)$$:

More on this next.

# Fac­tor­ing a distribution

Re­call the square show­ing our joint dis­tri­bu­tion $$\bP$$:

No­tice that in the above square, the red­dish blocks for $$\bP(A,B)$$ and $$\bP(A,\neg B)$$ are the same width and form a column; and like­wise the blueish blocks for $$\bP(\neg A,B)$$ and $$\bP(\neg A,\neg B)$$. This is be­cause we chose to fac­tor our prob­a­bil­ity dis­tri­bu­tion start­ing with $$A$$:

$$\bP(A,B) = \bP(A) \bP( B \mid A)\ .$$

Let’s use the equiv­alence be­tween events and bi­nary ran­dom vari­ables, so if we say $$\bP( B= \true \mid A= \false)$$ we mean $$\bP(B \mid \neg A)$$. For any choice of truth val­ues $$t_A \in \{\true, \false\}$$ and $$t_B \in \{\true, \false\}$$, we have

$$\bP(A = t_A,B= t_B) = \bP(A= t_A)\; \bP( B= t_B \mid A= t_A)\ .$$

The first fac­tor $$\bP(A = t_A)$$ tells us how wide to make the red column $$(\bP(A = \true))$$ rel­a­tive to the blue column $$(\bP(A = \false))$$. Then the sec­ond fac­tor $$\bP( B= t_B \mid A= t_A)$$ tells us the pro­por­tions of dark $$(B = \true)$$ and light $$(B = \false)$$ within the column for $$A = t_A$$.

We could just as well have fac­tored by $$B$$ first:

$$\bP(A = t_A,B= t_B) = \bP(B= t_B)\; \bP( A= t_A \mid B= t_b)\ .$$

Then we’d draw a pic­ture like this:

By the way, ear­lier when we fac­tored by $$A$$ first, we got sim­ple pic­tures of the prob­a­bil­ities $$\bP(B \mid A)$$ for $$B$$ con­di­tioned on $$A$$. Now that we’re fac­tor­ing by $$B$$ first, we have sim­ple pic­tures for the con­di­tional prob­a­bil­ity $$\bP(A \mid B)$$:

and for the con­di­tional prob­a­bil­ity $$\bP(A \mid \neg B)$$:

# Com­put­ing joint prob­a­bil­ities from fac­tored prob­a­bil­ities

Let’s say we know the fac­tored prob­a­bil­ities for $$A$$ and $$B$$, fac­tor­ing by $$A$$. That is, we know $$\bP(A = \true)$$, and we also know $$\bP(B = \true \mid A = \true)$$ and $$\bP(B = \true \mid A = \false)$$. How can we re­cover the joint prob­a­bil­ity $$\bP(A = t_A, B = t_B)$$ that $$A = t_A$$ is the case and also $$B = t_B$$ is the case?

Since

$$\bP(B = \false \mid A = \true) = \frac{\bP(A = \true, B = \false)}{\bP(A = \true)}\ ,$$

we can mul­ti­ply the prior $$\bP(A)$$ by the con­di­tional $$\bP(\neg B \mid A)$$ to get the joint $$\bP(A, \neg B)$$:

$$\bP(A = \true)\; \bP(B = \false \mid A = \true) = \bP(A = \true, B = \false)\ .$$

If we do this at the same time for all the pos­si­ble truth val­ues $$t_A$$ and $$t_B$$, we get back the full joint dis­tri­bu­tion:

in­for­ma­tion the­ory. a cou­ple things, then point to an­other page. eg show ex­am­ple when two things have lots of mu­tual info.

Children:

Parents:

• Probability theory

The logic of sci­ence; co­her­ence re­la­tions on quan­ti­ta­tive de­grees of be­lief.

• Does this ac­tu­ally work for any pro­por­tions of A and B? Is there a sim­ple proof?

• And is there any sig­nifi­cance to the fact that A and -A are di­vided by a straight line, but B and -B are di­vided by a jagged line? Could we have ar­ranged the rec­t­an­gle so that B and -B were di­vided by a straight line w/​o chang­ing any of the prob­a­bil­ities?

• Does this ac­tu­ally work for any pro­por­tions of A and B? Is there a sim­ple proof?

Yes, but I’m not sure it’s worth prov­ing? I’d say that the “Fac­tor­ing” sec­tion ex­plains how this works, though there are no proofs. Will add poin­t­ers at the be­gin­ning.

Could we have ar­ranged the rec­t­an­gle so that B and -B were di­vided by a straight line w/​o chang­ing any of the prob­a­bil­ities?

This is ad­dressed in the fac­tor­ing sec­tion.