Bayes' rule: Definition

Bayes’ rule is the mathematics of probability theory governing how to update your beliefs in the light of new evidence.

Notation

In much of what follows, we’ll use the following notation:

Let the hypotheses being considered be $H_1$ and $H_2$.
Let the evidence observed be $e_0.$
Let $\mathbb P(H_i)$ denote the prior probability of $H_i$ before observing the evidence.
Let the conditional probability $\mathbb P(e_0\mid H_i)$ denote the likelihood of observing evidence $e_0$ assuming $H_i$ to be true.
Let the conditional probability $\mathbb P(H_i\mid e_0)$ denote the posterior probability of $H_i$ after observing $e_0.$

Odds/proportional form

Bayes’ rule in the odds form or proportional form states:

$$\dfrac{\mathbb P(H_1)}{\mathbb P(H_2)} \times \dfrac{\mathbb P(e_0\mid H_1)}{\mathbb P(e_0\mid H_2)} = \dfrac{\mathbb P(H_1\mid e_0)}{\mathbb P(H_2\mid e_0)}$$

In other words, the prior odds times the likelihood ratio yield the posterior odds. Normalizing these odds will then yield the posterior probabilities.

In other other words: If you initially think $h_i$ is $\alpha$ times as probable as $h_k$, and then see evidence that you’re $\beta$ times as likely to see if $h_i$ is true as if $h_k$ is true, you should update to thinking that $h_i$ is $\alpha \cdot \beta$ times as probable as $h_k.$

Suppose that Professor Plum and Miss Scarlet are two suspects in a murder, and that we start out thinking that Professor Plum is twice as likely to have committed the murder as Miss Scarlet (prior odds of 2 : 1). We then discover that the victim was poisoned. We think that Professor Plum is around one-fourth as likely to use poison as Miss Scarlet (likelihood ratio of 1 : 4). Then after observing the victim was poisoned, we should think Plum is around half as likely to have committed the murder as Scarlet: $2 \times \dfrac{1}{4} = \dfrac{1}{2}.$ This reflects posterior odds of 1 : 2, or a posterior probability of ¹⁄₃, that Professor Plum did the deed.

Proof

The proof of Bayes’ rule is by the definition of conditional probability $\mathbb P(X\wedge Y) = \mathbb P(X\mid Y) \cdot \mathbb P(Y):$

$$ \dfrac{\mathbb P(H_i)}{\mathbb P(H_j)} \times \dfrac{\mathbb P(e\mid H_i)}{\mathbb P(e\mid H_j)} = \dfrac{\mathbb P(e \wedge H_i)}{\mathbb P(e \wedge H_j)} = \dfrac{\mathbb P(e \wedge H_i) / \mathbb P(e)}{\mathbb P(e \wedge H_j) / \mathbb P(e)} = \dfrac{\mathbb P(H_i\mid e)}{\mathbb P(H_j\mid e)} $$

Log odds form

The log odds form of Bayes’ rule states:

$$\log \left ( \dfrac {\mathbb P(H_i)} {\mathbb P(H_j)} \right ) + \log \left ( \dfrac {\mathbb P(e\mid H_i)} {\mathbb P(e\mid H_j)} \right ) = \log \left ( \dfrac {\mathbb P(H_i\mid e)} {\mathbb P(H_j\mid e)} \right ) $$

E.g.: “A study of Chinese blood donors found that roughly 1 in 100,000 of them had HIV (as determined by a very reliable gold-standard test). The non-gold-standard test used for initial screening had a sensitivity of 99.7% and a specificity of 99.8%, meaning that it was 500 times as likely to return positive for infected as non-infected patients.” Then our prior belief is −5 orders of magnitude against HIV, and if we then observe a positive test result, this is evidence of strength +2.7 orders of magnitude for HIV. Our posterior belief is −2.3 orders of magnitude, or odds of less than 1 to a 100, against HIV.

In log odds form, the same strength of evidence (log likelihood ratio) always moves us the same additive distance along a line representing strength of belief (also in log odds). If we measured distance in probabilities, then the same 2 : 1 likelihood ratio might move us a different distance along the probability line depending on whether we started with prior 10% probability or 50% probability.

Visualizations

Graphical of visualizing Bayes’ rule include frequency diagrams, the waterfall visualization, the spotlight visualization, the magnet visualization, and the Venn diagram for the proof.

Examples

Examples of Bayes’ rule may be found here.

Multiple hypotheses and updates

The odds form of Bayes’ rule works for odds ratios between more than two hypotheses, and applying multiple pieces of evidence. Suppose there’s a bathtub full of coins. ¹⁄₂ of the coins are “fair” and have a 50% probability of producing heads on each coinflip; ¹⁄₃ of the coins produce 25% heads; and ¹⁄₆ produce 75% heads. You pull out a coin at random, flip it 3 times, and get the result HTH. You may legitimately calculate:

$$\begin{array}{rll} (1/2 : ¹⁄₃ : ¹⁄₆) \cong & (3 : 2 : 1) & \\ \times & (2 : 1 : 3) & \\ \times & (2 : 3 : 1) & \\ \times & (2 : 1 : 3) & \\ = & (24 : 6 : 9) & \cong (8 : 2 : 3) \end{array}$$

Since multiple pieces of evidence may not be conditionally independent from one another, it is important to be aware of the Naive Bayes assumption and whether you are making it.

Probability form

As a formula for a single probability $\mathbb P(H_i\mid e),$ Bayes’ rule states:

$$\mathbb P(H_i\mid e) = \dfrac{\mathbb P(e\mid H_i) \cdot \mathbb P(H_i)}{\sum_k \mathbb P(e\mid H_k) \cdot \mathbb P(H_k)}$$

Functional form

In functional form, Bayes’ rule states:

$$\mathbb P(\mathbf{H}\mid e) \propto \mathbb P(e\mid \mathbf{H}) \cdot \mathbb P(\mathbf{H}).$$

The posterior probability function over hypotheses given the evidence, is proportional to the likelihood function from the evidence to those hypotheses, times the prior probability function over those hypotheses.

Since posterior probabilities over mutually exclusive and exhaustive possibilities must sum to $1,$ normalizing the product of the likelihood function and prior probability function will yield the exact posterior probability function.

Eric Rogstad 7 Jul 2016 1:38 UTC
It’s not totally clear what the antecedent of this “it’s” is. (Because “it’s” often means “it is the case that”)
Eric Rogstad 7 Jul 2016 1:39 UTC
Too Eliezer-voice. What would Sal Khan say?
- Nate Soares 8 Jul 2016 16:01 UTC
  Made a minor edit. If you want anything more, you’ll need to be more specific.
- Eric Rogstad 8 Jul 2016 20:57 UTC
  Yeah, that wasn’t a great comment from me :P