Bayes' rule: Odds form

One of the more convenient forms of Bayes’ rule uses relative odds. Bayes’ rule says that, when you observe a piece of evidence \(e,\) your posterior odds \(\mathbb O(\boldsymbol H \mid e)\) for your hypothesis vector \(\boldsymbol H\) given \(e\) is just your prior odds \(\mathbb O(\boldsymbol H)\) on \(\boldsymbol H\) times the likelihood function \(\mathcal L_e(\boldsymbol H).\)

For example, suppose we’re trying to solve a mysterious murder, and we start out thinking the odds of Professor Plum vs. Miss Scarlet committing the murder are 1 : 2, that is, Scarlet is twice as likely as Plum to have committed the murder a priori. We then observe that the victim was bludgeoned with a lead pipe. If we think that Plum, if he commits a murder, is around 60% likely to use a lead pipe, and that Scarlet, if she commits a murder, would be around 6% likely to us a lead pipe, this implies relative likelihoods of 10 : 1 for Plum vs. Scarlet using the pipe. The posterior odds for Plum vs. Scarlet, after observing the victim to have been murdered by a pipe, are \((1 : 2) \times (10 : 1) = (10 : 2) = (5 : 1)\). We now think Plum is around five times as likely as Scarlet to have committed the murder.

Odds functions

Let \(\boldsymbol H\) denote a vector of hypotheses. An odds function \(\mathbb O\) is a function that maps \(\boldsymbol H\) to a set of odds. For example, if \(\boldsymbol H = (H_1, H_2, H_3),\) then \(\mathbb O(\boldsymbol H)\) might be \((6 : 2 : 1),\) which says that \(H_1\) is 3x as likely as \(H_2\) and 6x as likely as \(H_3.\) An odds function captures our relative probabilities between the hypotheses in \(\boldsymbol H;\) for example, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the absolute probabilities of the \(H_i\) in order to know the relative odds. All we require is that the relative odds are proportional to the absolute probabilities:

$$\mathbb O(\boldsymbol H) \propto \mathbb P(\boldsymbol H).$$

In the example with the death of Mr. Boddy, suppose \(H_1\) denotes the proposition “Reverend Green murdered Mr. Boddy”, \(H_2\) denotes “Mrs. White did it”, and \(H_3\) denotes “Colonel Mustard did it”. Let \(\boldsymbol H\) be the vector \((H_1, H_2, H_3).\) If these propositions respectively have prior probabilities of 80%, 8%, and 4% (the remaining 8% being reserved for other hypotheses), then \(\mathbb O(\boldsymbol H) = (80 : 8 : 4) = (20 : 2 : 1)\) represents our relative credences about the murder suspects — that Reverend Green is 10 times as likely to be the murderer as Miss White, who is twice as likely to be the murderer as Colonel Mustard.

Likelihood functions

Suppose we discover that the victim was murdered by wrench. Suppose we think that Reverend Green, Mrs. White, and Colonel Mustard, if they murdered someone, would respectively be 60%, 90%, and 30% likely to use a wrench. Letting \(e_w\) denote the observation “The victim was murdered by wrench,” we would have \(\mathbb P(e_w\mid \boldsymbol H) = (0.6, 0.9, 0.3).\) This gives us a likelihood function defined as \(\mathcal L_{e_w}(\boldsymbol H) = P(e_w \mid \boldsymbol H).\)

Bayes’ rule, odds form

Let \(\mathbb O(\boldsymbol H\mid e)\) denote the posterior odds of the hypotheses \(\boldsymbol H\) after observing evidence \(e.\) Bayes’ rule then states:

$$\mathbb O(\boldsymbol H) \times \mathcal L_{e}(\boldsymbol H) = \mathbb O(\boldsymbol H\mid e)$$

This says that we can multiply the relative prior credence \(\mathbb O(\boldsymbol H)\) by the likelihood \(\mathcal L_{e}(\boldsymbol H)\) to arrive at the relative posterior credence \(\mathbb O(\boldsymbol H\mid e).\) Because odds are invariant under multiplication by a positive constant, it wouldn’t make any difference if the likelihood function was scaled up or down by a constant, because that would only have the effect of multiplying the final odds by a constant, which does not affect them. Thus, only the relative likelihoods are necessary to perform the calculation; the absolute likelihoods are unnecessary. Therefore, when performing the calculation, we can simplify \(\mathcal L_e(\boldsymbol H) = (0.6, 0.9, 0.3)\) to the relative likelihoods \((2 : 3 : 1).\)

In our example, this makes the calculation quite easy. The prior odds for Green vs White vs Mustard were \((20 : 2 : 1).\) The relative likelihoods were \((0.6 : 0.9 : 0.3)\) = \((2 : 3 : 1).\) Thus, the relative posterior odds after observing \(e_w\) = Mr. Boddy was killed by wrench are \((20 : 2 : 1) \times (2 : 3 : 1) = (40 : 6 : 1).\) Given the evidence, Reverend Green is 40 times as likely as Colonel Mustard to be the killer, and 203 times as likely as Mrs. White.

Bayes’ rule states that this relative proportioning of odds among these three suspects will be correct, regardless of how our remaining 8% probability mass is assigned to all other suspects and possibilities, or indeed, how much probability mass we assigned to other suspects to begin with. For a proof, see Proof of Bayes’ rule.

Visualization

Frequency diagrams, waterfall diagrams, and spotlight diagrams may be helpful for explaining or visualizing the odds form of Bayes’ rule.

Children:

Parents:

  • Bayes' rule

    Bayes’ rule is the core theorem of probability theory saying how to revise our beliefs when we make a new observation.