Bayes' rule: Definition

Bayes’ rule is the math­e­mat­ics of prob­a­bil­ity the­ory gov­ern­ing how to up­date your be­liefs in the light of new ev­i­dence.

Notation

In much of what fol­lows, we’ll use the fol­low­ing no­ta­tion:

Odds/​pro­por­tional form

Bayes’ rule in the odds form or pro­por­tional form states:

$$\dfrac{\mathbb P(H_1)}{\mathbb P(H_2)} \times \dfrac{\mathbb P(e_0\mid H_1)}{\mathbb P(e_0\mid H_2)} = \dfrac{\mathbb P(H_1\mid e_0)}{\mathbb P(H_2\mid e_0)}$$

In other words, the prior odds times the like­li­hood ra­tio yield the pos­te­rior odds. Nor­mal­iz­ing these odds will then yield the pos­te­rior prob­a­bil­ities.

In other other words: If you ini­tially think \(h_i\) is \(\alpha\) times as prob­a­ble as \(h_k\), and then see ev­i­dence that you’re \(\beta\) times as likely to see if \(h_i\) is true as if \(h_k\) is true, you should up­date to think­ing that \(h_i\) is \(\alpha \cdot \beta\) times as prob­a­ble as \(h_k.\)

Sup­pose that Pro­fes­sor Plum and Miss Scar­let are two sus­pects in a mur­der, and that we start out think­ing that Pro­fes­sor Plum is twice as likely to have com­mit­ted the mur­der as Miss Scar­let (prior odds of 2 : 1). We then dis­cover that the vic­tim was poi­soned. We think that Pro­fes­sor Plum is around one-fourth as likely to use poi­son as Miss Scar­let (like­li­hood ra­tio of 1 : 4). Then af­ter ob­serv­ing the vic­tim was poi­soned, we should think Plum is around half as likely to have com­mit­ted the mur­der as Scar­let: \(2 \times \dfrac{1}{4} = \dfrac{1}{2}.\) This re­flects pos­te­rior odds of 1 : 2, or a pos­te­rior prob­a­bil­ity of 13, that Pro­fes­sor Plum did the deed.

Proof

The proof of Bayes’ rule is by the defi­ni­tion of con­di­tional prob­a­bil­ity \(\mathbb P(X\wedge Y) = \mathbb P(X\mid Y) \cdot \mathbb P(Y):\)

$$ \dfrac{\mathbb P(H_i)}{\mathbb P(H_j)} \times \dfrac{\mathbb P(e\mid H_i)}{\mathbb P(e\mid H_j)} = \dfrac{\mathbb P(e \wedge H_i)}{\mathbb P(e \wedge H_j)} = \dfrac{\mathbb P(e \wedge H_i) / \mathbb P(e)}{\mathbb P(e \wedge H_j) / \mathbb P(e)} = \dfrac{\mathbb P(H_i\mid e)}{\mathbb P(H_j\mid e)} $$

Log odds form

The log odds form of Bayes’ rule states:

$$\log \left ( \dfrac {\mathbb P(H_i)} {\mathbb P(H_j)} \right ) + \log \left ( \dfrac {\mathbb P(e\mid H_i)} {\mathbb P(e\mid H_j)} \right ) = \log \left ( \dfrac {\mathbb P(H_i\mid e)} {\mathbb P(H_j\mid e)} \right ) $$

E.g.: “A study of Chi­nese blood donors found that roughly 1 in 100,000 of them had HIV (as de­ter­mined by a very re­li­able gold-stan­dard test). The non-gold-stan­dard test used for ini­tial screen­ing had a sen­si­tivity of 99.7% and a speci­fic­ity of 99.8%, mean­ing that it was 500 times as likely to re­turn pos­i­tive for in­fected as non-in­fected pa­tients.” Then our prior be­lief is −5 or­ders of mag­ni­tude against HIV, and if we then ob­serve a pos­i­tive test re­sult, this is ev­i­dence of strength +2.7 or­ders of mag­ni­tude for HIV. Our pos­te­rior be­lief is −2.3 or­ders of mag­ni­tude, or odds of less than 1 to a 100, against HIV.

In log odds form, the same strength of ev­i­dence (log like­li­hood ra­tio) always moves us the same ad­di­tive dis­tance along a line rep­re­sent­ing strength of be­lief (also in log odds). If we mea­sured dis­tance in prob­a­bil­ities, then the same 2 : 1 like­li­hood ra­tio might move us a differ­ent dis­tance along the prob­a­bil­ity line de­pend­ing on whether we started with prior 10% prob­a­bil­ity or 50% prob­a­bil­ity.

Visualizations

Graph­i­cal of vi­su­al­iz­ing Bayes’ rule in­clude fre­quency di­a­grams, the wa­ter­fall vi­su­al­iza­tion, the spotlight vi­su­al­iza­tion, the mag­net vi­su­al­iza­tion, and the Venn di­a­gram for the proof.

Examples

Ex­am­ples of Bayes’ rule may be found here.

Mul­ti­ple hy­pothe­ses and updates

The odds form of Bayes’ rule works for odds ra­tios be­tween more than two hy­pothe­ses, and ap­ply­ing mul­ti­ple pieces of ev­i­dence. Sup­pose there’s a bath­tub full of coins. 12 of the coins are “fair” and have a 50% prob­a­bil­ity of pro­duc­ing heads on each coin­flip; 13 of the coins pro­duce 25% heads; and 16 pro­duce 75% heads. You pull out a coin at ran­dom, flip it 3 times, and get the re­sult HTH. You may le­gi­t­i­mately calcu­late:

$$\begin{array}{rll} (1/2 : 13 : 16) \cong & (3 : 2 : 1) & \\ \times & (2 : 1 : 3) & \\ \times & (2 : 3 : 1) & \\ \times & (2 : 1 : 3) & \\ = & (24 : 6 : 9) & \cong (8 : 2 : 3) \end{array}$$

Since mul­ti­ple pieces of ev­i­dence may not be con­di­tion­ally in­de­pen­dent from one an­other, it is im­por­tant to be aware of the Naive Bayes as­sump­tion and whether you are mak­ing it.

Prob­a­bil­ity form

As a for­mula for a sin­gle prob­a­bil­ity \(\mathbb P(H_i\mid e),\) Bayes’ rule states:

$$\mathbb P(H_i\mid e) = \dfrac{\mathbb P(e\mid H_i) \cdot \mathbb P(H_i)}{\sum_k \mathbb P(e\mid H_k) \cdot \mathbb P(H_k)}$$

Func­tional form

In func­tional form, Bayes’ rule states:

$$\mathbb P(\mathbf{H}\mid e) \propto \mathbb P(e\mid \mathbf{H}) \cdot \mathbb P(\mathbf{H}).$$

The pos­te­rior prob­a­bil­ity func­tion over hy­pothe­ses given the ev­i­dence, is pro­por­tional to the like­li­hood func­tion from the ev­i­dence to those hy­pothe­ses, times the prior prob­a­bil­ity func­tion over those hy­pothe­ses.

Since pos­te­rior prob­a­bil­ities over mu­tu­ally ex­clu­sive and ex­haus­tive pos­si­bil­ities must sum to \(1,\) nor­mal­iz­ing the product of the like­li­hood func­tion and prior prob­a­bil­ity func­tion will yield the ex­act pos­te­rior prob­a­bil­ity func­tion.

Parents:

  • Bayes' rule

    Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.

    • Bayesian reasoning

      A prob­a­bil­ity-the­ory-based view of the world; a co­her­ent way of chang­ing prob­a­bil­is­tic be­liefs based on ev­i­dence.