# Bayes' rule: Odds form

One of the more convenient forms of Bayes’ rule uses relative odds. Bayes’ rule says that, when you observe a piece of evidence \(e,\) your posterior odds \(\mathbb O(\boldsymbol H \mid e)\) for your hypothesis vector \(\boldsymbol H\) given \(e\) is just your prior odds \(\mathbb O(\boldsymbol H)\) on \(\boldsymbol H\) times the likelihood function \(\mathcal L_e(\boldsymbol H).\)

For example, suppose we’re trying to solve a mysterious murder, and we start out thinking the odds of Professor Plum vs. Miss Scarlet committing the murder are 1 : 2, that is, Scarlet is twice as likely as Plum to have committed the murder a priori. We then observe that the victim was bludgeoned with a lead pipe. If we think that Plum, *if* he commits a murder, is around 60% likely to use a lead pipe, and that Scarlet, *if* she commits a murder, would be around 6% likely to us a lead pipe, this implies relative likelihoods of 10 : 1 for Plum vs. Scarlet using the pipe. The posterior odds for Plum vs. Scarlet, after observing the victim to have been murdered by a pipe, are \((1 : 2) \times (10 : 1) = (10 : 2) = (5 : 1)\). We now think Plum is around five times as likely as Scarlet to have committed the murder.

# Odds functions

Let \(\boldsymbol H\) denote a vector of hypotheses. An odds function \(\mathbb O\) is a function that maps \(\boldsymbol H\) to a set of odds. For example, if \(\boldsymbol H = (H_1, H_2, H_3),\) then \(\mathbb O(\boldsymbol H)\) might be \((6 : 2 : 1),\) which says that \(H_1\) is 3x as likely as \(H_2\) and 6x as likely as \(H_3.\) An odds function captures our *relative* probabilities between the hypotheses in \(\boldsymbol H;\) for example, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the absolute probabilities of the \(H_i\) in order to know the relative odds. All we require is that the relative odds are proportional to the absolute probabilities:

In the example with the death of Mr. Boddy, suppose \(H_1\) denotes the proposition “Reverend Green murdered Mr. Boddy”, \(H_2\) denotes “Mrs. White did it”, and \(H_3\) denotes “Colonel Mustard did it”. Let \(\boldsymbol H\) be the vector \((H_1, H_2, H_3).\) If these propositions respectively have prior probabilities of 80%, 8%, and 4% (the remaining 8% being reserved for other hypotheses), then \(\mathbb O(\boldsymbol H) = (80 : 8 : 4) = (20 : 2 : 1)\) represents our *relative* credences about the murder suspects — that Reverend Green is 10 times as likely to be the murderer as Miss White, who is twice as likely to be the murderer as Colonel Mustard.

# Likelihood functions

Suppose we discover that the victim was murdered by wrench. Suppose we think that Reverend Green, Mrs. White, and Colonel Mustard, *if* they murdered someone, would respectively be 60%, 90%, and 30% likely to use a wrench. Letting \(e_w\) denote the observation “The victim was murdered by wrench,” we would have \(\mathbb P(e_w\mid \boldsymbol H) = (0.6, 0.9, 0.3).\) This gives us a likelihood function defined as \(\mathcal L_{e_w}(\boldsymbol H) = P(e_w \mid \boldsymbol H).\)

# Bayes’ rule, odds form

Let \(\mathbb O(\boldsymbol H\mid e)\) denote the posterior odds of the hypotheses \(\boldsymbol H\) after observing evidence \(e.\) Bayes’ rule then states:

This says that we can multiply the relative prior credence \(\mathbb O(\boldsymbol H)\) by the likelihood \(\mathcal L_{e}(\boldsymbol H)\) to arrive at the relative posterior credence \(\mathbb O(\boldsymbol H\mid e).\) Because odds are invariant under multiplication by a positive constant, it wouldn’t make any difference if the *likelihood* function was scaled up or down by a constant, because that would only have the effect of multiplying the final odds by a constant, which does not affect them. Thus, only the relative likelihoods are necessary to perform the calculation; the absolute likelihoods are unnecessary. Therefore, when performing the calculation, we can simplify \(\mathcal L_e(\boldsymbol H) = (0.6, 0.9, 0.3)\) to the relative likelihoods \((2 : 3 : 1).\)

In our example, this makes the calculation quite easy. The prior odds for Green vs White vs Mustard were \((20 : 2 : 1).\) The relative likelihoods were \((0.6 : 0.9 : 0.3)\) = \((2 : 3 : 1).\) Thus, the relative posterior odds after observing \(e_w\) = Mr. Boddy was killed by wrench are \((20 : 2 : 1) \times (2 : 3 : 1) = (40 : 6 : 1).\) Given the evidence, Reverend Green is 40 times as likely as Colonel Mustard to be the killer, and ^{20}⁄_{3} times as likely as Mrs. White.

Bayes’ rule states that this *relative* proportioning of odds among these three suspects will be correct, regardless of how our remaining 8% probability mass is assigned to all other suspects and possibilities, or indeed, how much probability mass we assigned to other suspects to begin with. For a proof, see Proof of Bayes’ rule.

# Visualization

Frequency diagrams, waterfall diagrams, and spotlight diagrams may be helpful for explaining or visualizing the odds form of Bayes’ rule.

Children:

- Introduction to Bayes' rule: Odds form
Bayes’ rule is simple, if you think in terms of relative odds.

Parents:

- Bayes' rule
Bayes’ rule is the core theorem of probability theory saying how to revise our beliefs when we make a new observation.

This page asks me if I learnt the concept of “Odds ratio”—but nowhere in the page does it actually explicitly

talkabout odds ratios, only about odds.