Introduction to Bayes' rule: Odds form

!if-af­ter(Water­fall di­a­grams and rel­a­tive odds): This in­tro­duc­tion is meant to be read af­ter the in­tro­duc­tions to fre­quency vi­su­al­iza­tions and wa­ter­fall vi­su­al­iza­tions.

In gen­eral, Bayes’ rule states:

$$ \textbf{Prior odds} \times \textbf{Relative likelihoods} = \textbf{Posterior odds}$$

If we con­sider the wa­ter­fall vi­su­al­iza­tion of the Dise­a­sitis ex­am­ple, then we can vi­su­al­ize how rel­a­tive odds are ap­pro­pri­ate for think­ing about the two rivers at the top of the wa­ter­fall.

Waterfall visualization

The pro­por­tion of red vs. blue wa­ter at the bot­tom will be the same whether there’s 200 vs. 800 gal­lons per sec­ond of red vs. blue wa­ter at the top of the wa­ter­fall, or 20,000 vs. 80,000 gal­lons/​sec, or 1 vs. 4 gal­lons/​sec­ond. So long as the rest of the wa­ter­fall be­haves in a pro­por­tional way, we’ll get the same pro­por­tion of red vs blue at the bot­tom. Thus, we’re jus­tified in ig­nor­ing the amount of wa­ter and con­sid­er­ing only the rel­a­tive pro­por­tion be­tween amounts.

Similarly, what mat­ters is the rel­a­tive pro­por­tion be­tween how much of each gal­lon of red wa­ter makes it into the shared pool, and how much of each gal­lon of blue wa­ter, makes it. 45% and 15% of the red and blue wa­ter mak­ing it to the bot­tom would give the same rel­a­tive pro­por­tion of red and blue wa­ter in the bot­tom pool as 90% and 30%.

Changing the proportion makes no difference

This jus­tifies throw­ing away the spe­cific data that 90% of the red stream and 30% of the blue stream make it down, and sum­ma­riz­ing this into rel­a­tive like­li­hoods of (3 : 1).

More gen­er­ally, sup­pose we have a med­i­cal test that de­tects a sick­ness with a 90% true pos­i­tive rate (10% false nega­tives) and a 30% false pos­i­tive rate (70% true nega­tives). A pos­i­tive re­sult on this test rep­re­sents the same strength of ev­i­dence as a test with 60% true pos­i­tives and 20% false pos­i­tives. A nega­tive re­sult on this test rep­re­sents the same strength of ev­i­dence as a test with 9% false nega­tives and 63% true nega­tives.

In gen­eral, the strength of ev­i­dence is sum­ma­rized by how rel­a­tively likely differ­ent states of the world make our ob­ser­va­tions.

!if-be­fore(Bayes’ rule: Log-odds form): For more on this idea, see Strength of Bayesian ev­i­dence.
if-be­fore(Bayes’ rule: Log-odds form): More on this later.

The equation

To state Bayes’ rule in full gen­er­al­ity, and prove it as a the­o­rem, we’ll need to in­tro­duce some new no­ta­tion.

Con­di­tional probability

First, when \(X\) is a propo­si­tion, \(\mathbb P(X)\) will stand for the prob­a­bil­ity of \(X.\)

In other words, \(X\) is some­thing that’s ei­ther true or false in re­al­ity, but we’re un­cer­tain about it, and \(\mathbb P(X)\) is a way of ex­press­ing our de­gree of be­lief that \(X\) is true. A pa­tient is, in fact, ei­ther sick or healthy; but if you don’t know which of these is the case, the ev­i­dence might lead you to as­sign a 43% sub­jec­tive prob­a­bil­ity that the pa­tient is sick.

\(\mathbb \neg X\) will mean ”\(X\) is false”, so \(\mathbb P(\neg X)\) is the “the prob­a­bil­ity \(X\) is false”.

The Dise­a­sitis in­volved some more com­pli­cated state­ments than this, though; in par­tic­u­lar it in­volved:

  • The 90% chance that a pa­tient black­ens the tongue de­pres­sor, given that they have Dise­a­sitis.

  • The 30% chance that a pa­tient black­ens the tongue de­pres­sor, given that they’re healthy.

  • The 37 chance that a pa­tient has Dise­a­sitis, given that they black­ened the tongue de­pres­sor.

In these cases we want to go from some fact that is as­sumed or known to be true (on the right), to some other propo­si­tion (on the left) whose new prob­a­bil­ity we want to ask about, tak­ing into ac­count that as­sump­tion.

Prob­a­bil­ity state­ments like those are known as “con­di­tional prob­a­bil­ities”. The stan­dard no­ta­tion for con­di­tional prob­a­bil­ity ex­presses the above quan­tities as:

  • \(\mathbb P(blackened \mid sick) = 0.9\)

  • \(\mathbb P(blackened \mid \neg sick) = 0.3\)

  • \(\mathbb P(sick \mid blackened) = 3/7\)

This stan­dard no­ta­tion for \(\mathbb P(X \mid Y)\) mean­ing “the prob­a­bil­ity of \(X\), as­sum­ing \(Y\) to be true” is a helpfully sym­met­ri­cal ver­ti­cal line, to avoid giv­ing you any vi­sual clue to re­mem­ber that the as­sump­tion is on the right and the in­ferred propo­si­tion is on the left. <sar­casm>

Con­di­tional prob­a­bil­ity is defined as fol­lows. Us­ing the no­ta­tion \(X \wedge Y\) to de­note “X and Y” or “both \(X\) and \(Y\) are true”:

$$\mathbb P(X \mid Y) := \frac{\mathbb P(X \wedge Y)}{\mathbb P(Y)}$$

E.g. in the Dise­a­sitis ex­am­ple, \(\mathbb P(sick \mid blackened)\) is calcu­lated by di­vid­ing the 18% stu­dents who are sick and have black­ened tongue de­pres­sors (\(\mathbb P(sick \wedge blackened)\)), by the to­tal 42% stu­dents who have black­ened tongue de­pres­sors (\(\mathbb P(blackened)\)).

Or \(\mathbb P(blackened \mid \neg sick),\) the prob­a­bil­ity of black­en­ing the tongue de­pres­sor given that you’re healthy, is equiv­a­lent to the 24 stu­dents who are healthy and have black­ened tongue de­pres­sors, di­vided by the 80 stu­dents who are healthy. 24 /​ 80 = 310, so this cor­re­sponds to the 30% false pos­i­tives we were told about at the start.

We can see the law of con­di­tional prob­a­bil­ity as say­ing, “Let us re­strict our at­ten­tion to wor­lds where \(Y\) is the case, or thin­gies of which \(Y\) is true. Look­ing only at cases where \(Y\) is true, how many cases are there in­side that re­stric­tion where \(X\) is also true—cases with \(X\) and \(Y\)?”

For more on this, see Con­di­tional prob­a­bil­ity.

Bayes’ rule

Bayes’ rule says:

$$\textbf{Prior odds} \times \textbf{Relative likelihoods} = \textbf{Posterior odds}$$

In the Dise­a­sitis ex­am­ple, this would state:

$$\dfrac{\mathbb P({sick})}{\mathbb P(healthy)} \times \dfrac{\mathbb P({blackened}\mid {sick})}{\mathbb P({blackened}\mid healthy)} = \dfrac{\mathbb P({sick}\mid {blackened})}{\mathbb P(healthy\mid {blackened})}.$$

todo: ap­par­ently the par­allel is not su­per ob­vi­ous, and maybe we can use slightly differ­ent col­ors in the text to make it clearer that e.g. Prior odds → sick/​healthy

The prior odds re­fer to the rel­a­tive pro­por­tion of sick vs healthy pa­tients, which is \(1 : 4\). Con­vert­ing these odds into prob­a­bil­ities gives us \(\mathbb P(sick)=\frac{1}{4+1}=\frac{1}{5}=20\%\).

The rel­a­tive like­li­hood refers to how much more likely each sick pa­tient is to get a pos­i­tive test re­sult than each healthy pa­tient, which (us­ing con­di­tional prob­a­bil­ity no­ta­tion) is \(\frac{\mathbb P(positive \mid sick)}{\mathbb P(positive \mid healthy)}=\frac{0.90}{0.30},\) aka rel­a­tive like­li­hoods of \(3 : 1.\)

The pos­te­rior odds are the rel­a­tive pro­por­tions of sick vs healthy pa­tients among those with pos­i­tive test re­sults, or \(\frac{\mathbb P(sick \mid positive)}{\mathbb P(healthy \mid positive)} = \frac{3}{4}\), aka \(3 : 4\) odds.

To ex­tract the prob­a­bil­ity from the rel­a­tive odds, we keep in mind that prob­a­bil­ities of mu­tu­ally ex­clu­sive and ex­haus­tive propo­si­tions need to sum to \(1,\) that is, there is a 100% prob­a­bil­ity of some­thing hap­pen­ing. Since ev­ery­one is ei­ther sick or not sick, we can nor­mal­ize the odd ra­tio \(3 : 4\) by di­vid­ing through by the sum of terms:

$$(\frac{3}{3+4} : \frac{4}{3+4}) = (\frac{3}{7} : \frac{4}{7}) \approx (0.43 : 0.57)$$

…end­ing up with the prob­a­bil­ities (0.43 : 0.57), pro­por­tional to the origi­nal ra­tio of (3 : 4), but sum­ming to 1. It would be very odd if some­thing had prob­a­bil­ity \(3\) (300% prob­a­bil­ity) of hap­pen­ing.

Us­ing the wa­ter­fall vi­su­al­iza­tion:

labeled waterfall

We can gen­er­al­ize this to any two hy­pothe­ses \(H_j\) and \(H_k\) with ev­i­dence \(e\), in which case Bayes’ rule can be writ­ten as:

$$\dfrac{\mathbb P(H_j)}{\mathbb P(H_k)} \times \dfrac{\mathbb P(e \mid H_j)}{\mathbb P(e \mid H_k)} = \dfrac{\mathbb P(H_j \mid e)}{\mathbb P(H_k \mid e)}$$

which says “the pos­te­rior odds ra­tio for hy­pothe­ses \(H_j\) vs \(H_k\) (af­ter see­ing the ev­i­dence \(e\)) are equal to the prior odds ra­tio times the ra­tio of how well \(H_j\) pre­dicted the ev­i­dence com­pared to \(H_k.\)

If \(H_j\) and \(H_k\) are mu­tu­ally ex­clu­sive and ex­haus­tive, we can con­vert the pos­te­rior odds into a pos­te­rior prob­a­bil­ity for \(H_j\) by nor­mal­iz­ing the odds—di­vid­ing through the odds ra­tio by the sum of its terms, so that the el­e­ments of the new ra­tio sum to \(1.\)

Proof of Bayes’ rule

Rear­rang­ing the defi­ni­tion of con­di­tional prob­a­bil­ity, \(\mathbb P(X \wedge Y) = \mathbb P(Y) \cdot \mathbb P(X|Y).\) E.g. to find “the frac­tion of all pa­tients that are sick and get a pos­i­tive re­sult”, we mul­ti­ply “the frac­tion of pa­tients that are sick” times “the prob­a­bil­ity that a sick pa­tient black­ens the tongue de­pres­sor”.

Then this is a proof of Bayes’ rule:

$$ \frac{\mathbb P(H_j)}{\mathbb P(H_k)} \cdot \frac{\mathbb P(e_0 | H_j)}{\mathbb P(e_0 | H_k)} = \frac{\mathbb P(e_0 \wedge H_j)}{\mathbb P(e_0 \wedge H_k)} = \frac{\mathbb P(H_j \wedge e_0)/\mathbb P(e_0)}{\mathbb P(H_k \wedge e_0)/\mathbb P(e_0)} = \frac{\mathbb P(H_j | e_0)}{\mathbb P(H_k | e_0)} $$

QED.

In the Dise­a­sitis ex­am­ple, these proof steps cor­re­spond to the op­er­a­tions:

$$ \frac{0.20}{0.80} \cdot \frac{0.90}{0.30} = \frac{0.18}{0.24} = \frac{0.18/0.42}{0.24/0.42} = \frac{0.43}{0.57} $$

Us­ing red for sick, blue for healthy, grey for a mix of sick and healthy pa­tients, and + signs for pos­i­tive test re­sults, the calcu­la­tion steps can be vi­su­al­ized as fol­lows:

bayes venn

todo: maybe re­place this di­a­gram with pie-chart cir­cles in ex­actly right pro­por­tions (but still with the cor­rect pop­u­la­tions of + signs)

This pro­cess of ob­serv­ing ev­i­dence and us­ing its like­li­hood ra­tio to trans­form a prior be­lief into a pos­te­rior be­lief is called a “Bayesian up­date” or “be­lief re­vi­sion.”

if-be­fore(Ex­traor­di­nary claims re­quire ex­traor­di­nary ev­i­dence): Con­grat­u­la­tions! You now know (we hope) what Bayes’ rule is, and how to ap­ply it to sim­ple se­tups. After this, the path con­tinues with fur­ther im­pli­ca­tions %if-be­fore(Bayes’ rule: Vec­tor form): and ad­di­tional forms% of Bayes’ rule. This might be a good time to take a break, if you want one—but we hope you con­tinue on this Ar­bital path af­ter that!

  • For the gen­er­al­iza­tion of the odds form of Bayes’ rule to mul­ti­ple hy­pothe­ses and mul­ti­ple items of ev­i­dence, see Bayes’ rule: Vec­tor form.

  • For a trans­for­ma­tion of the odds form that makes the strength of ev­i­dence even more di­rectly visi­ble, see Bayes’ rule: Log-odds form. <div>

Parents:

  • Bayes' rule: Odds form

    The sim­plest and most eas­ily un­der­stand­able form of Bayes’ rule uses rel­a­tive odds.

    • Bayes' rule

      Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.