Two independent events: Square visualization

$$ \newcommand{\true}{\text{True}} \newcommand{\false}{\text{False}} \newcommand{\bP}{\mathbb{P}} $$

This is what independence looks like, using the square visualization of probabilities:

We can see that the events $A$ and $B$ don’t interact; we say that $A$ and $B$ are independent. Whether we look at the whole square, or just the red part of the square where $A$ is true, the probability of $B$ stays the same. In other words, $\bP(B \mid A) = \bP(B)$. That’s what we mean by independence: the probability of $B$ doesn’t change if you condition on $A$.

Our square of probabilities can be generated by multiplying together the probability of $A$ and the probability of $B$:

This picture demonstrates another way to define what it means for $A$ and $B$ to be independent:

$$\bP(A, B) = \bP(A)\bP(B)\ .$$

In terms of factoring a joint distribution

Let’s contrast independence with non-independence. Here’s a picture of two ordinary, non-independent events $A$ and $B$:

(If the meaning of this picture isn’t clear, take a look at Square visualization of probabilities on two events.)

We have the red blocks for $\bP(A)$ and the blue blocks for $\bP(\neg A)$ lined up in columns. This means we’ve factored our probability distribution using $A$ as the first factor:

$$\bP(A,B) = \bP(A) \bP(B \mid A)\ .$$

We could just as well have factored by $B$ first: $\bP(A,B) = \bP(B) \bP( A \mid B)\ .$ Then we’d draw a picture like this:

Now, here again is the picture of two independent events $A$ and $B$:

In this picture, there’s red and blue lined-up columns for $\bP(A)$ and $\bP(\neg A)$, and there’s also dark and light lined-up rows for $\bP(B)$ and $\bP(\neg B)$. It looks like we somehow factored our probability distribution $\bP$ using both $A$ and $B$ as the first factor.

In fact, this is exactly what happened: since $A$ and $B$ are independent, we have that $\bP(B \mid A) = \bP(B)$. So the diagram above is actually factored according to $A$ first: $\bP(A,B) = \bP(A) \bP(B \mid A)$. It’s just that $\bP(B \mid A)= \bP(B) = \bP(B \mid \neg A)$, since $B$ is independent from $A$. So we don’t need to have different ratios of dark to light (a.k.a. conditional probabilities of $B$) in the left and right columns:

In this visualization, we can see what happens to the probability of $B$ when you condition on $A$ or on $\neg A$: it doesn’t change at all. The ratio of [the area where $B$ happens] to [the whole area], is the same as the ratio $\bP(B \mid A)$ where we only look at the area where $A$ happens, which is the same as the ratio $\bP(B \mid \neg A)$ where we only look at the area where $\neg A$ happens. The fact that the probability of $B$ doesn’t change when we condition on $A$ is exactly what we mean when we say that $A$ and $B$ are independent.

The square diagram above is also factored according to $B$ first, using $\bP(A,B) = \bP(B) \bP(A \mid B)$. The red / blue ratios are the same in both rows because $\bP(A \mid B) = \bP(A) = \bP(A \mid \neg B)$, since $A$ and $B$ are independent:

We couldn’t do any of this stuff if the columns and rows didn’t both line up. (Which is good, because then we’d have proved the false statement that any two events are independent!)

In terms of multiplying marginal probabilities

Another way to say that $A$ and $B$ are independent variables noteWe’re using the equivalence between event probability events and binary variables. is that for any truth values $t_A,t_B \in \{\true, \false\},$

$$\bP(A = t_A, B= t_B) = \bP(A = t_A)\bP(B = t_B)\ .$$

So the joint probabilities for $A$ and $B$ are computed by separately getting the probability of $A$ and the probability of $B$, and then multiplying the two probabilities together. For example, say we want to compute the probability $\bP(A, \neg B) = \bP(A = \true, B = \false)$. We start with the marginal probability of $A$:

and the probability of $\neg B$:

and then we multiply them:

We can get all the joint probabilities this way. So we can visualize the whole joint distribution as the thing that you get when you multiply two independent probability distributions together. We just overlay the two distributions:

To be a little more mathematically elegant, we’d use the topological product of two spaces shown earlier to draw the joint distribution as a product of the distributions of $A$ and $B$: