# Bayes' rule: Odds form

One of the more con­ve­nient forms of Bayes’ rule uses rel­a­tive odds. Bayes’ rule says that, when you ob­serve a piece of ev­i­dence $$e,$$ your pos­te­rior odds $$\mathbb O(\boldsymbol H \mid e)$$ for your hy­poth­e­sis vec­tor $$\boldsymbol H$$ given $$e$$ is just your prior odds $$\mathbb O(\boldsymbol H)$$ on $$\boldsymbol H$$ times the like­li­hood func­tion $$\mathcal L_e(\boldsymbol H).$$

For ex­am­ple, sup­pose we’re try­ing to solve a mys­te­ri­ous mur­der, and we start out think­ing the odds of Pro­fes­sor Plum vs. Miss Scar­let com­mit­ting the mur­der are 1 : 2, that is, Scar­let is twice as likely as Plum to have com­mit­ted the mur­der a pri­ori. We then ob­serve that the vic­tim was blud­geoned with a lead pipe. If we think that Plum, if he com­mits a mur­der, is around 60% likely to use a lead pipe, and that Scar­let, if she com­mits a mur­der, would be around 6% likely to us a lead pipe, this im­plies rel­a­tive like­li­hoods of 10 : 1 for Plum vs. Scar­let us­ing the pipe. The pos­te­rior odds for Plum vs. Scar­let, af­ter ob­serv­ing the vic­tim to have been mur­dered by a pipe, are $$(1 : 2) \times (10 : 1) = (10 : 2) = (5 : 1)$$. We now think Plum is around five times as likely as Scar­let to have com­mit­ted the mur­der.

# Odds functions

Let $$\boldsymbol H$$ de­note a vec­tor of hy­pothe­ses. An odds func­tion $$\mathbb O$$ is a func­tion that maps $$\boldsymbol H$$ to a set of odds. For ex­am­ple, if $$\boldsymbol H = (H_1, H_2, H_3),$$ then $$\mathbb O(\boldsymbol H)$$ might be $$(6 : 2 : 1),$$ which says that $$H_1$$ is 3x as likely as $$H_2$$ and 6x as likely as $$H_3.$$ An odds func­tion cap­tures our rel­a­tive prob­a­bil­ities be­tween the hy­pothe­ses in $$\boldsymbol H;$$ for ex­am­ple, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the ab­solute prob­a­bil­ities of the $$H_i$$ in or­der to know the rel­a­tive odds. All we re­quire is that the rel­a­tive odds are pro­por­tional to the ab­solute prob­a­bil­ities:

$$\mathbb O(\boldsymbol H) \propto \mathbb P(\boldsymbol H).$$

In the ex­am­ple with the death of Mr. Boddy, sup­pose $$H_1$$ de­notes the propo­si­tion “Rev­erend Green mur­dered Mr. Boddy”, $$H_2$$ de­notes “Mrs. White did it”, and $$H_3$$ de­notes “Colonel Mus­tard did it”. Let $$\boldsymbol H$$ be the vec­tor $$(H_1, H_2, H_3).$$ If these propo­si­tions re­spec­tively have prior prob­a­bil­ities of 80%, 8%, and 4% (the re­main­ing 8% be­ing re­served for other hy­pothe­ses), then $$\mathbb O(\boldsymbol H) = (80 : 8 : 4) = (20 : 2 : 1)$$ rep­re­sents our rel­a­tive cre­dences about the mur­der sus­pects — that Rev­erend Green is 10 times as likely to be the mur­derer as Miss White, who is twice as likely to be the mur­derer as Colonel Mus­tard.

# Like­li­hood functions

Sup­pose we dis­cover that the vic­tim was mur­dered by wrench. Sup­pose we think that Rev­erend Green, Mrs. White, and Colonel Mus­tard, if they mur­dered some­one, would re­spec­tively be 60%, 90%, and 30% likely to use a wrench. Let­ting $$e_w$$ de­note the ob­ser­va­tion “The vic­tim was mur­dered by wrench,” we would have $$\mathbb P(e_w\mid \boldsymbol H) = (0.6, 0.9, 0.3).$$ This gives us a like­li­hood func­tion defined as $$\mathcal L_{e_w}(\boldsymbol H) = P(e_w \mid \boldsymbol H).$$

# Bayes’ rule, odds form

Let $$\mathbb O(\boldsymbol H\mid e)$$ de­note the pos­te­rior odds of the hy­pothe­ses $$\boldsymbol H$$ af­ter ob­serv­ing ev­i­dence $$e.$$ Bayes’ rule then states:

$$\mathbb O(\boldsymbol H) \times \mathcal L_{e}(\boldsymbol H) = \mathbb O(\boldsymbol H\mid e)$$

This says that we can mul­ti­ply the rel­a­tive prior cre­dence $$\mathbb O(\boldsymbol H)$$ by the like­li­hood $$\mathcal L_{e}(\boldsymbol H)$$ to ar­rive at the rel­a­tive pos­te­rior cre­dence $$\mathbb O(\boldsymbol H\mid e).$$ Be­cause odds are in­var­i­ant un­der mul­ti­pli­ca­tion by a pos­i­tive con­stant, it wouldn’t make any differ­ence if the like­li­hood func­tion was scaled up or down by a con­stant, be­cause that would only have the effect of mul­ti­ply­ing the fi­nal odds by a con­stant, which does not af­fect them. Thus, only the rel­a­tive like­li­hoods are nec­es­sary to perform the calcu­la­tion; the ab­solute like­li­hoods are un­nec­es­sary. There­fore, when perform­ing the calcu­la­tion, we can sim­plify $$\mathcal L_e(\boldsymbol H) = (0.6, 0.9, 0.3)$$ to the rel­a­tive like­li­hoods $$(2 : 3 : 1).$$

In our ex­am­ple, this makes the calcu­la­tion quite easy. The prior odds for Green vs White vs Mus­tard were $$(20 : 2 : 1).$$ The rel­a­tive like­li­hoods were $$(0.6 : 0.9 : 0.3)$$ = $$(2 : 3 : 1).$$ Thus, the rel­a­tive pos­te­rior odds af­ter ob­serv­ing $$e_w$$ = Mr. Boddy was kil­led by wrench are $$(20 : 2 : 1) \times (2 : 3 : 1) = (40 : 6 : 1).$$ Given the ev­i­dence, Rev­erend Green is 40 times as likely as Colonel Mus­tard to be the kil­ler, and 203 times as likely as Mrs. White.

Bayes’ rule states that this rel­a­tive pro­por­tion­ing of odds among these three sus­pects will be cor­rect, re­gard­less of how our re­main­ing 8% prob­a­bil­ity mass is as­signed to all other sus­pects and pos­si­bil­ities, or in­deed, how much prob­a­bil­ity mass we as­signed to other sus­pects to be­gin with. For a proof, see Proof of Bayes’ rule.

# Visualization

Fre­quency di­a­grams, wa­ter­fall di­a­grams, and spotlight di­a­grams may be helpful for ex­plain­ing or vi­su­al­iz­ing the odds form of Bayes’ rule.

Children:

Parents:

• Bayes' rule

Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.

• This page asks me if I learnt the con­cept of “Odds ra­tio”—but nowhere in the page does it ac­tu­ally ex­plic­itly talk about odds ra­tios, only about odds.