Bayes' rule: Odds form

One of the more con­ve­nient forms of Bayes’ rule uses rel­a­tive odds. Bayes’ rule says that, when you ob­serve a piece of ev­i­dence \(e,\) your pos­te­rior odds \(\mathbb O(\boldsymbol H \mid e)\) for your hy­poth­e­sis vec­tor \(\boldsymbol H\) given \(e\) is just your prior odds \(\mathbb O(\boldsymbol H)\) on \(\boldsymbol H\) times the like­li­hood func­tion \(\mathcal L_e(\boldsymbol H).\)

For ex­am­ple, sup­pose we’re try­ing to solve a mys­te­ri­ous mur­der, and we start out think­ing the odds of Pro­fes­sor Plum vs. Miss Scar­let com­mit­ting the mur­der are 1 : 2, that is, Scar­let is twice as likely as Plum to have com­mit­ted the mur­der a pri­ori. We then ob­serve that the vic­tim was blud­geoned with a lead pipe. If we think that Plum, if he com­mits a mur­der, is around 60% likely to use a lead pipe, and that Scar­let, if she com­mits a mur­der, would be around 6% likely to us a lead pipe, this im­plies rel­a­tive like­li­hoods of 10 : 1 for Plum vs. Scar­let us­ing the pipe. The pos­te­rior odds for Plum vs. Scar­let, af­ter ob­serv­ing the vic­tim to have been mur­dered by a pipe, are \((1 : 2) \times (10 : 1) = (10 : 2) = (5 : 1)\). We now think Plum is around five times as likely as Scar­let to have com­mit­ted the mur­der.

Odds functions

Let \(\boldsymbol H\) de­note a vec­tor of hy­pothe­ses. An odds func­tion \(\mathbb O\) is a func­tion that maps \(\boldsymbol H\) to a set of odds. For ex­am­ple, if \(\boldsymbol H = (H_1, H_2, H_3),\) then \(\mathbb O(\boldsymbol H)\) might be \((6 : 2 : 1),\) which says that \(H_1\) is 3x as likely as \(H_2\) and 6x as likely as \(H_3.\) An odds func­tion cap­tures our rel­a­tive prob­a­bil­ities be­tween the hy­pothe­ses in \(\boldsymbol H;\) for ex­am­ple, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the ab­solute prob­a­bil­ities of the \(H_i\) in or­der to know the rel­a­tive odds. All we re­quire is that the rel­a­tive odds are pro­por­tional to the ab­solute prob­a­bil­ities:

$$\mathbb O(\boldsymbol H) \propto \mathbb P(\boldsymbol H).$$

In the ex­am­ple with the death of Mr. Boddy, sup­pose \(H_1\) de­notes the propo­si­tion “Rev­erend Green mur­dered Mr. Boddy”, \(H_2\) de­notes “Mrs. White did it”, and \(H_3\) de­notes “Colonel Mus­tard did it”. Let \(\boldsymbol H\) be the vec­tor \((H_1, H_2, H_3).\) If these propo­si­tions re­spec­tively have prior prob­a­bil­ities of 80%, 8%, and 4% (the re­main­ing 8% be­ing re­served for other hy­pothe­ses), then \(\mathbb O(\boldsymbol H) = (80 : 8 : 4) = (20 : 2 : 1)\) rep­re­sents our rel­a­tive cre­dences about the mur­der sus­pects — that Rev­erend Green is 10 times as likely to be the mur­derer as Miss White, who is twice as likely to be the mur­derer as Colonel Mus­tard.

Like­li­hood functions

Sup­pose we dis­cover that the vic­tim was mur­dered by wrench. Sup­pose we think that Rev­erend Green, Mrs. White, and Colonel Mus­tard, if they mur­dered some­one, would re­spec­tively be 60%, 90%, and 30% likely to use a wrench. Let­ting \(e_w\) de­note the ob­ser­va­tion “The vic­tim was mur­dered by wrench,” we would have \(\mathbb P(e_w\mid \boldsymbol H) = (0.6, 0.9, 0.3).\) This gives us a like­li­hood func­tion defined as \(\mathcal L_{e_w}(\boldsymbol H) = P(e_w \mid \boldsymbol H).\)

Bayes’ rule, odds form

Let \(\mathbb O(\boldsymbol H\mid e)\) de­note the pos­te­rior odds of the hy­pothe­ses \(\boldsymbol H\) af­ter ob­serv­ing ev­i­dence \(e.\) Bayes’ rule then states:

$$\mathbb O(\boldsymbol H) \times \mathcal L_{e}(\boldsymbol H) = \mathbb O(\boldsymbol H\mid e)$$

This says that we can mul­ti­ply the rel­a­tive prior cre­dence \(\mathbb O(\boldsymbol H)\) by the like­li­hood \(\mathcal L_{e}(\boldsymbol H)\) to ar­rive at the rel­a­tive pos­te­rior cre­dence \(\mathbb O(\boldsymbol H\mid e).\) Be­cause odds are in­var­i­ant un­der mul­ti­pli­ca­tion by a pos­i­tive con­stant, it wouldn’t make any differ­ence if the like­li­hood func­tion was scaled up or down by a con­stant, be­cause that would only have the effect of mul­ti­ply­ing the fi­nal odds by a con­stant, which does not af­fect them. Thus, only the rel­a­tive like­li­hoods are nec­es­sary to perform the calcu­la­tion; the ab­solute like­li­hoods are un­nec­es­sary. There­fore, when perform­ing the calcu­la­tion, we can sim­plify \(\mathcal L_e(\boldsymbol H) = (0.6, 0.9, 0.3)\) to the rel­a­tive like­li­hoods \((2 : 3 : 1).\)

In our ex­am­ple, this makes the calcu­la­tion quite easy. The prior odds for Green vs White vs Mus­tard were \((20 : 2 : 1).\) The rel­a­tive like­li­hoods were \((0.6 : 0.9 : 0.3)\) = \((2 : 3 : 1).\) Thus, the rel­a­tive pos­te­rior odds af­ter ob­serv­ing \(e_w\) = Mr. Boddy was kil­led by wrench are \((20 : 2 : 1) \times (2 : 3 : 1) = (40 : 6 : 1).\) Given the ev­i­dence, Rev­erend Green is 40 times as likely as Colonel Mus­tard to be the kil­ler, and 203 times as likely as Mrs. White.

Bayes’ rule states that this rel­a­tive pro­por­tion­ing of odds among these three sus­pects will be cor­rect, re­gard­less of how our re­main­ing 8% prob­a­bil­ity mass is as­signed to all other sus­pects and pos­si­bil­ities, or in­deed, how much prob­a­bil­ity mass we as­signed to other sus­pects to be­gin with. For a proof, see Proof of Bayes’ rule.

Visualization

Fre­quency di­a­grams, wa­ter­fall di­a­grams, and spotlight di­a­grams may be helpful for ex­plain­ing or vi­su­al­iz­ing the odds form of Bayes’ rule.

Children:

Parents:

  • Bayes' rule

    Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.