Bayes' rule: Odds form
One of the more convenient forms of Bayes’ rule uses relative odds. Bayes’ rule says that, when you observe a piece of evidence \(e,\) your posterior odds \(\mathbb O(\boldsymbol H \mid e)\) for your hypothesis vector \(\boldsymbol H\) given \(e\) is just your prior odds \(\mathbb O(\boldsymbol H)\) on \(\boldsymbol H\) times the likelihood function \(\mathcal L_e(\boldsymbol H).\)
For example, suppose we’re trying to solve a mysterious murder, and we start out thinking the odds of Professor Plum vs. Miss Scarlet committing the murder are 1 : 2, that is, Scarlet is twice as likely as Plum to have committed the murder a priori. We then observe that the victim was bludgeoned with a lead pipe. If we think that Plum, if he commits a murder, is around 60% likely to use a lead pipe, and that Scarlet, if she commits a murder, would be around 6% likely to us a lead pipe, this implies relative likelihoods of 10 : 1 for Plum vs. Scarlet using the pipe. The posterior odds for Plum vs. Scarlet, after observing the victim to have been murdered by a pipe, are \((1 : 2) \times (10 : 1) = (10 : 2) = (5 : 1)\). We now think Plum is around five times as likely as Scarlet to have committed the murder.
Odds functions
Let \(\boldsymbol H\) denote a vector of hypotheses. An odds function \(\mathbb O\) is a function that maps \(\boldsymbol H\) to a set of odds. For example, if \(\boldsymbol H = (H_1, H_2, H_3),\) then \(\mathbb O(\boldsymbol H)\) might be \((6 : 2 : 1),\) which says that \(H_1\) is 3x as likely as \(H_2\) and 6x as likely as \(H_3.\) An odds function captures our relative probabilities between the hypotheses in \(\boldsymbol H;\) for example, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the absolute probabilities of the \(H_i\) in order to know the relative odds. All we require is that the relative odds are proportional to the absolute probabilities:
In the example with the death of Mr. Boddy, suppose \(H_1\) denotes the proposition “Reverend Green murdered Mr. Boddy”, \(H_2\) denotes “Mrs. White did it”, and \(H_3\) denotes “Colonel Mustard did it”. Let \(\boldsymbol H\) be the vector \((H_1, H_2, H_3).\) If these propositions respectively have prior probabilities of 80%, 8%, and 4% (the remaining 8% being reserved for other hypotheses), then \(\mathbb O(\boldsymbol H) = (80 : 8 : 4) = (20 : 2 : 1)\) represents our relative credences about the murder suspects — that Reverend Green is 10 times as likely to be the murderer as Miss White, who is twice as likely to be the murderer as Colonel Mustard.
Likelihood functions
Suppose we discover that the victim was murdered by wrench. Suppose we think that Reverend Green, Mrs. White, and Colonel Mustard, if they murdered someone, would respectively be 60%, 90%, and 30% likely to use a wrench. Letting \(e_w\) denote the observation “The victim was murdered by wrench,” we would have \(\mathbb P(e_w\mid \boldsymbol H) = (0.6, 0.9, 0.3).\) This gives us a likelihood function defined as \(\mathcal L_{e_w}(\boldsymbol H) = P(e_w \mid \boldsymbol H).\)
Bayes’ rule, odds form
Let \(\mathbb O(\boldsymbol H\mid e)\) denote the posterior odds of the hypotheses \(\boldsymbol H\) after observing evidence \(e.\) Bayes’ rule then states:
This says that we can multiply the relative prior credence \(\mathbb O(\boldsymbol H)\) by the likelihood \(\mathcal L_{e}(\boldsymbol H)\) to arrive at the relative posterior credence \(\mathbb O(\boldsymbol H\mid e).\) Because odds are invariant under multiplication by a positive constant, it wouldn’t make any difference if the likelihood function was scaled up or down by a constant, because that would only have the effect of multiplying the final odds by a constant, which does not affect them. Thus, only the relative likelihoods are necessary to perform the calculation; the absolute likelihoods are unnecessary. Therefore, when performing the calculation, we can simplify \(\mathcal L_e(\boldsymbol H) = (0.6, 0.9, 0.3)\) to the relative likelihoods \((2 : 3 : 1).\)
In our example, this makes the calculation quite easy. The prior odds for Green vs White vs Mustard were \((20 : 2 : 1).\) The relative likelihoods were \((0.6 : 0.9 : 0.3)\) = \((2 : 3 : 1).\) Thus, the relative posterior odds after observing \(e_w\) = Mr. Boddy was killed by wrench are \((20 : 2 : 1) \times (2 : 3 : 1) = (40 : 6 : 1).\) Given the evidence, Reverend Green is 40 times as likely as Colonel Mustard to be the killer, and 20⁄3 times as likely as Mrs. White.
Bayes’ rule states that this relative proportioning of odds among these three suspects will be correct, regardless of how our remaining 8% probability mass is assigned to all other suspects and possibilities, or indeed, how much probability mass we assigned to other suspects to begin with. For a proof, see Proof of Bayes’ rule.
Visualization
Frequency diagrams, waterfall diagrams, and spotlight diagrams may be helpful for explaining or visualizing the odds form of Bayes’ rule.
Children:
- Introduction to Bayes' rule: Odds form
Bayes’ rule is simple, if you think in terms of relative odds.
Parents:
- Bayes' rule
Bayes’ rule is the core theorem of probability theory saying how to revise our beliefs when we make a new observation.
This page asks me if I learnt the concept of “Odds ratio”—but nowhere in the page does it actually explicitly talk about odds ratios, only about odds.