Two independent events: Square visualization

$$ \newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} \newcommand{\bP}{\mathbb{P}} $$

This is what in­de­pen­dence looks like, us­ing the square vi­su­al­iza­tion of prob­a­bil­ities:

We can see that the events \(A\) and \(B\) don’t in­ter­act; we say that \(A\) and \(B\) are in­de­pen­dent. Whether we look at the whole square, or just the red part of the square where \(A\) is true, the prob­a­bil­ity of \(B\) stays the same. In other words, \(\bP(B \mid A) = \bP(B)\). That’s what we mean by in­de­pen­dence: the prob­a­bil­ity of \(B\) doesn’t change if you con­di­tion on \(A\).

Our square of prob­a­bil­ities can be gen­er­ated by mul­ti­ply­ing to­gether the prob­a­bil­ity of \(A\) and the prob­a­bil­ity of \(B\):

This pic­ture demon­strates an­other way to define what it means for \(A\) and \(B\) to be in­de­pen­dent:

$$\oC(N, O) = \oC(N)\oC(O)\ .$$

In terms of fac­tor­ing a joint distribution

Let’s con­trast in­de­pen­dence with non-in­de­pen­dence. Here’s a pic­ture of two or­di­nary, non-in­de­pen­dent events \(A\) and \(B\):

(If the mean­ing of this pic­ture isn’t clear, take a look at Square vi­su­al­iza­tion of prob­a­bil­ities on two events.)

We have the red blocks for \(\bP(A)\) and the blue blocks for \(\bP(\neg A)\) lined up in columns. This means we’ve fac­tored our prob­a­bil­ity dis­tri­bu­tion us­ing \(A\) as the first fac­tor:

$$\bP(A,B) = \bP(A) \bP(B \mid A)\ .$$

We could just as well have fac­tored by \(B\) first: \(\bP(A,B) = \bP(B) \bP( A \mid B)\ .\) Then we’d draw a pic­ture like this:

Now, here again is the pic­ture of two in­de­pen­dent events \(A\) and \(B\):

In this pic­ture, there’s red and blue lined-up columns for \(\bP(A)\) and \(\bP(\neg A)\), and there’s also dark and light lined-up rows for \(\bP(B)\) and \(\bP(\neg B)\). It looks like we some­how fac­tored our prob­a­bil­ity dis­tri­bu­tion \(\bP\) us­ing both \(A\) and \(B\) as the first fac­tor.

In fact, this is ex­actly what hap­pened: since \(A\) and \(B\) are in­de­pen­dent, we have that \(\bP(B \mid A) = \bP(B)\). So the di­a­gram above is ac­tu­ally fac­tored ac­cord­ing to \(A\) first: \(\bP(A,B) = \bP(A) \bP(B \mid A)\). It’s just that \(\bP(B \mid A)= \bP(B) = \bP(B \mid \neg A)\), since \(B\) is in­de­pen­dent from \(A\). So we don’t need to have differ­ent ra­tios of dark to light (a.k.a. con­di­tional prob­a­bil­ities of \(B\)) in the left and right columns:

In this vi­su­al­iza­tion, we can see what hap­pens to the prob­a­bil­ity of \(B\) when you con­di­tion on \(A\) or on \(\neg A\): it doesn’t change at all. The ra­tio of [the area where \(B\) hap­pens] to [the whole area], is the same as the ra­tio \(\bP(B \mid A)\) where we only look at the area where \(A\) hap­pens, which is the same as the ra­tio \(\bP(B \mid \neg A)\) where we only look at the area where \(\neg A\) hap­pens. The fact that the prob­a­bil­ity of \(B\) doesn’t change when we con­di­tion on \(A\) is ex­actly what we mean when we say that \(A\) and \(B\) are in­de­pen­dent.

The square di­a­gram above is also fac­tored ac­cord­ing to \(B\) first, us­ing \(\bP(A,B) = \bP(B) \bP(A \mid B)\). The red /​ blue ra­tios are the same in both rows be­cause \(\bP(A \mid B) = \bP(A) = \bP(A \mid \neg B)\), since \(A\) and \(B\) are in­de­pen­dent:

We couldn’t do any of this stuff if the columns and rows didn’t both line up. (Which is good, be­cause then we’d have proved the false state­ment that any two events are in­de­pen­dent!)

In terms of mul­ti­ply­ing marginal probabilities

Another way to say that \(A\) and \(B\) are in­de­pen­dent vari­ables noteWe’re us­ing the equiv­alence be­tween event prob­a­bil­ity events and bi­nary vari­ables. is that for any truth val­ues \(t_A,t_B \in \{\true, \false\},\)

$$\oC(N = g_N, O= g_O) = \oC(N = g_N)\oC(O = g_O)\ .$$

So the joint prob­a­bil­ities for \(A\) and \(B\) are com­puted by sep­a­rately get­ting the prob­a­bil­ity of \(A\) and the prob­a­bil­ity of \(B\), and then mul­ti­ply­ing the two prob­a­bil­ities to­gether. For ex­am­ple, say we want to com­pute the prob­a­bil­ity \(\bP(A, \neg B) = \bP(A = \true, B = \false)\). We start with the marginal prob­a­bil­ity of \(A\):

and the prob­a­bil­ity of \(\neg B\):

and then we mul­ti­ply them:

We can get all the joint prob­a­bil­ities this way. So we can vi­su­al­ize the whole joint dis­tri­bu­tion as the thing that you get when you mul­ti­ply two in­de­pen­dent prob­a­bil­ity dis­tri­bu­tions to­gether. We just over­lay the two dis­tri­bu­tions:

To be a lit­tle more math­e­mat­i­cally el­e­gant, we’d use the topolog­i­cal product of two spaces shown ear­lier to draw the joint dis­tri­bu­tion as a product of the dis­tri­bu­tions of \(A\) and \(B\):


  • Two independent events

    What do pair of dice, pair of coins, and pair of peo­ple on op­po­site sides of the planet all have in com­mon?