Square visualization of probabilities on two events

$$ \newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} \newcommand{\bP}{\mathbb{P}} $$

Say we have two events, \(A\) and \(B\), and a prob­a­bil­ity dis­tri­bu­tion \(\bP\) over whether or not they hap­pen. We can rep­re­sent \(\bP\) as a square:

So for ex­am­ple, the prob­a­bil­ity \(\bP(A,B)\) of both \(A\) and \(B\) oc­cur­ring is the ra­tio of [the area of the dark red re­gion] to [the area of the en­tire square]:

Vi­su­al­iz­ing prob­a­bil­ities in a square is neat be­cause we can draw sim­ple pic­tures that high­light in­ter­est­ing facts about our prob­a­bil­ity dis­tri­bu­tion.

Below are some pic­tures illus­trat­ing:

  • in­de­pen­dent events (What hap­pens if the columns and the rows in our square both line up?)

  • marginal prob­a­bil­ities (If we’re look­ing at a square of prob­a­bil­ities, where’s the prob­a­bil­ity \(\bP(A)\) of \(A\) or the prob­a­bil­ity \(\bP(\neg B)\)?)

  • con­di­tional prob­a­bil­ities (Can we find in the square the prob­a­bil­ity \(\bP(B \mid A)\) of \(B\) if we con­di­tion on see­ing \(A\)? What about the con­di­tional prob­a­bil­ity \(\bP(A \mid B)\)?)

  • fac­tor­ing a dis­tri­bu­tion (Can we always write \(\bP\) as a square? Why do the columns line up but not the rows?)

  • the pro­cess of com­put­ing joint prob­a­bil­ities from fac­tored probabilities

In­de­pen­dent events

Here’s a pic­ture of the joint dis­tri­bu­tion of two in­de­pen­dent events \(A\) and \(B\):

Now the rows for \(\bP(B)\) and \(\bP(\neg B)\) line up across the two columns. This is be­cause \(\bP(B \mid A) = \bP(B) = \bP(B \mid \neg A)\). When \(A\) and \(B\) are in­de­pen­dent, up­dat­ing on \(A\) or \(\neg A\) doesn’t change the prob­a­bil­ity of \(B\).

For more on this vi­su­al­iza­tion of in­de­pen­dent events, see the aptly named Two in­de­pen­dent events: Square vi­su­al­iza­tion.

Marginal probabilities

We can see the marginal prob­a­bil­ities of \(A\) and \(B\) by look­ing at some of the blocks in our square. For ex­am­ple, to find the prob­a­bil­ity \(\bP(\neg A)\) that \(A\) doesn’t oc­cur, we just need to add up all the blocks where \(\neg A\) hap­pens: \(\bP(\neg A) = \bP(\neg A, B) + \bP(\neg A, \neg B)\).

Here’s the prob­a­bil­ity \(\bP(A)\) of \(A\), and the prob­a­bil­ity \(\bP(\neg A)\) of \(\neg A\):

Here’s the prob­a­bil­ity \(\bP(\neg B)\) of \(\neg B\):

In these pic­tures we’re di­vid­ing by the area of the whole square. Since the prob­a­bil­ity of any­thing at all hap­pen­ing is 1, we could just leave it out, but it’ll be helpful for com­par­i­son while we think about con­di­tion­als next.

Con­di­tional probabilities

We can start with some prob­a­bil­ity \(\bP(B)\), and then as­sume that \(A\) is true to get a con­di­tional prob­a­bil­ity \(\bP(B \mid A)\) of \(B\). Con­di­tion­ing on \(A\) be­ing true is like re­strict­ing our whole at­ten­tion to just the pos­si­ble wor­lds where \(A\) hap­pens:

Then the con­di­tional prob­a­bil­ity of \(B\) given \(A\) is the pro­por­tion of these \(A\) wor­lds where \(B\) also hap­pens:

If in­stead we con­di­tion on \(\neg A\), we get:

So our square vi­su­al­iza­tion gives a nice way to see, at a glance, the con­di­tional prob­a­bil­ities of \(B\) given \(A\) or given \(\neg A\):

We don’t get such nice pic­tures for \(\bP(A \mid B)\):

More on this next.

Fac­tor­ing a distribution

Re­call the square show­ing our joint dis­tri­bu­tion \(\bP\):

No­tice that in the above square, the red­dish blocks for \(\bP(A,B)\) and \(\bP(A,\neg B)\) are the same width and form a column; and like­wise the blueish blocks for \(\bP(\neg A,B)\) and \(\bP(\neg A,\neg B)\). This is be­cause we chose to fac­tor our prob­a­bil­ity dis­tri­bu­tion start­ing with \(A\):

$$\bP(A,B) = \bP(A) \bP( B \mid A)\ .$$

Let’s use the equiv­alence be­tween events and bi­nary ran­dom vari­ables, so if we say \(\bP( B= \true \mid A= \false)\) we mean \(\bP(B \mid \neg A)\). For any choice of truth val­ues \(t_A \in \{\true, \false\}\) and \(t_B \in \{\true, \false\}\), we have

$$\bP(A = t_A,B= t_B) = \bP(A= t_A)\; \bP( B= t_B \mid A= t_A)\ .$$

The first fac­tor \(\bP(A = t_A)\) tells us how wide to make the red column \((\bP(A = \true))\) rel­a­tive to the blue column \((\bP(A = \false))\). Then the sec­ond fac­tor \(\bP( B= t_B \mid A= t_A)\) tells us the pro­por­tions of dark \((B = \true)\) and light \((B = \false)\) within the column for \(A = t_A\).

We could just as well have fac­tored by \(B\) first:

$$\bP(A = t_A,B= t_B) = \bP(B= t_B)\; \bP( A= t_A \mid B= t_b)\ .$$

Then we’d draw a pic­ture like this:

By the way, ear­lier when we fac­tored by \(A\) first, we got sim­ple pic­tures of the prob­a­bil­ities \(\bP(B \mid A)\) for \(B\) con­di­tioned on \(A\). Now that we’re fac­tor­ing by \(B\) first, we have sim­ple pic­tures for the con­di­tional prob­a­bil­ity \(\bP(A \mid B)\):

and for the con­di­tional prob­a­bil­ity \(\bP(A \mid \neg B)\):

Com­put­ing joint prob­a­bil­ities from fac­tored prob­a­bil­ities

Let’s say we know the fac­tored prob­a­bil­ities for \(A\) and \(B\), fac­tor­ing by \(A\). That is, we know \(\bP(A = \true)\), and we also know \(\bP(B = \true \mid A = \true)\) and \(\bP(B = \true \mid A = \false)\). How can we re­cover the joint prob­a­bil­ity \(\bP(A = t_A, B = t_B)\) that \(A = t_A\) is the case and also \(B = t_B\) is the case?


$$\bP(B = \false \mid A = \true) = \frac{\bP(A = \true, B = \false)}{\bP(A = \true)}\ ,$$

we can mul­ti­ply the prior \(\bP(A)\) by the con­di­tional \(\bP(\neg B \mid A)\) to get the joint \(\bP(A, \neg B)\):

$$\bP(A = \true)\; \bP(B = \false \mid A = \true) = \bP(A = \true, B = \false)\ .$$

If we do this at the same time for all the pos­si­ble truth val­ues \(t_A\) and \(t_B\), we get back the full joint dis­tri­bu­tion:

in­for­ma­tion the­ory. a cou­ple things, then point to an­other page. eg show ex­am­ple when two things have lots of mu­tual info.



  • Probability theory

    The logic of sci­ence; co­her­ence re­la­tions on quan­ti­ta­tive de­grees of be­lief.