# Introduction to Bayes' rule: Odds form

!if-af­ter(Water­fall di­a­grams and rel­a­tive odds): This in­tro­duc­tion is meant to be read af­ter the in­tro­duc­tions to fre­quency vi­su­al­iza­tions and wa­ter­fall vi­su­al­iza­tions.

In gen­eral, Bayes’ rule states:

$$\textbf{Prior odds} \times \textbf{Relative likelihoods} = \textbf{Posterior odds}$$

If we con­sider the wa­ter­fall vi­su­al­iza­tion of the Dise­a­sitis ex­am­ple, then we can vi­su­al­ize how rel­a­tive odds are ap­pro­pri­ate for think­ing about the two rivers at the top of the wa­ter­fall.

The pro­por­tion of red vs. blue wa­ter at the bot­tom will be the same whether there’s 200 vs. 800 gal­lons per sec­ond of red vs. blue wa­ter at the top of the wa­ter­fall, or 20,000 vs. 80,000 gal­lons/​sec, or 1 vs. 4 gal­lons/​sec­ond. So long as the rest of the wa­ter­fall be­haves in a pro­por­tional way, we’ll get the same pro­por­tion of red vs blue at the bot­tom. Thus, we’re jus­tified in ig­nor­ing the amount of wa­ter and con­sid­er­ing only the rel­a­tive pro­por­tion be­tween amounts.

Similarly, what mat­ters is the rel­a­tive pro­por­tion be­tween how much of each gal­lon of red wa­ter makes it into the shared pool, and how much of each gal­lon of blue wa­ter, makes it. 45% and 15% of the red and blue wa­ter mak­ing it to the bot­tom would give the same rel­a­tive pro­por­tion of red and blue wa­ter in the bot­tom pool as 90% and 30%.

This jus­tifies throw­ing away the spe­cific data that 90% of the red stream and 30% of the blue stream make it down, and sum­ma­riz­ing this into rel­a­tive like­li­hoods of (3 : 1).

More gen­er­ally, sup­pose we have a med­i­cal test that de­tects a sick­ness with a 90% true pos­i­tive rate (10% false nega­tives) and a 30% false pos­i­tive rate (70% true nega­tives). A pos­i­tive re­sult on this test rep­re­sents the same strength of ev­i­dence as a test with 60% true pos­i­tives and 20% false pos­i­tives. A nega­tive re­sult on this test rep­re­sents the same strength of ev­i­dence as a test with 9% false nega­tives and 63% true nega­tives.

In gen­eral, the strength of ev­i­dence is sum­ma­rized by how rel­a­tively likely differ­ent states of the world make our ob­ser­va­tions.

!if-be­fore(Bayes’ rule: Log-odds form): For more on this idea, see Strength of Bayesian ev­i­dence.
if-be­fore(Bayes’ rule: Log-odds form): More on this later.

# The equation

To state Bayes’ rule in full gen­er­al­ity, and prove it as a the­o­rem, we’ll need to in­tro­duce some new no­ta­tion.

## Con­di­tional probability

First, when $$X$$ is a propo­si­tion, $$\mathbb P(X)$$ will stand for the prob­a­bil­ity of $$X.$$

In other words, $$X$$ is some­thing that’s ei­ther true or false in re­al­ity, but we’re un­cer­tain about it, and $$\mathbb P(X)$$ is a way of ex­press­ing our de­gree of be­lief that $$X$$ is true. A pa­tient is, in fact, ei­ther sick or healthy; but if you don’t know which of these is the case, the ev­i­dence might lead you to as­sign a 43% sub­jec­tive prob­a­bil­ity that the pa­tient is sick.

$$\mathbb \neg X$$ will mean “$X$ is false”, so $$\mathbb P(\neg X)$$ is the “the prob­a­bil­ity $$X$$ is false”.

The Dise­a­sitis in­volved some more com­pli­cated state­ments than this, though; in par­tic­u­lar it in­volved:

• The 90% chance that a pa­tient black­ens the tongue de­pres­sor, given that they have Dise­a­sitis.

• The 30% chance that a pa­tient black­ens the tongue de­pres­sor, given that they’re healthy.

• The 37 chance that a pa­tient has Dise­a­sitis, given that they black­ened the tongue de­pres­sor.

In these cases we want to go from some fact that is as­sumed or known to be true (on the right), to some other propo­si­tion (on the left) whose new prob­a­bil­ity we want to ask about, tak­ing into ac­count that as­sump­tion.

Prob­a­bil­ity state­ments like those are known as “con­di­tional prob­a­bil­ities”. The stan­dard no­ta­tion for con­di­tional prob­a­bil­ity ex­presses the above quan­tities as:

• $$\mathbb P(blackened \mid sick) = 0.9$$

• $$\mathbb P(blackened \mid \neg sick) = 0.3$$

• $$\mathbb P(sick \mid blackened) = 3/7$$

This stan­dard no­ta­tion for $$\mathbb P(X \mid Y)$$ mean­ing “the prob­a­bil­ity of $$X$$, as­sum­ing $$Y$$ to be true” is a helpfully sym­met­ri­cal ver­ti­cal line, to avoid giv­ing you any vi­sual clue to re­mem­ber that the as­sump­tion is on the right and the in­ferred propo­si­tion is on the left. <sar­casm>

Con­di­tional prob­a­bil­ity is defined as fol­lows. Us­ing the no­ta­tion $$X \wedge Y$$ to de­note “X and Y” or “both $$X$$ and $$Y$$ are true”:

$$\mathbb P(X \mid Y) := \frac{\mathbb P(X \wedge Y)}{\mathbb P(Y)}$$

E.g. in the Dise­a­sitis ex­am­ple, $$\mathbb P(sick \mid blackened)$$ is calcu­lated by di­vid­ing the 18% stu­dents who are sick and have black­ened tongue de­pres­sors ($\mathbb P(sick \wedge black­ened)$), by the to­tal 42% stu­dents who have black­ened tongue de­pres­sors ($\mathbb P(black­ened)$).

Or $$\mathbb P(blackened \mid \neg sick),$$ the prob­a­bil­ity of black­en­ing the tongue de­pres­sor given that you’re healthy, is equiv­a­lent to the 24 stu­dents who are healthy and have black­ened tongue de­pres­sors, di­vided by the 80 stu­dents who are healthy. 24 /​ 80 = 310, so this cor­re­sponds to the 30% false pos­i­tives we were told about at the start.

We can see the law of con­di­tional prob­a­bil­ity as say­ing, “Let us re­strict our at­ten­tion to wor­lds where $$Y$$ is the case, or thin­gies of which $$Y$$ is true. Look­ing only at cases where $$Y$$ is true, how many cases are there in­side that re­stric­tion where $$X$$ is also true—cases with $$X$$ and $$Y$$?”

For more on this, see Con­di­tional prob­a­bil­ity.

## Bayes’ rule

Bayes’ rule says:

$$\textbf{Prior odds} \times \textbf{Relative likelihoods} = \textbf{Posterior odds}$$

In the Dise­a­sitis ex­am­ple, this would state:

$$\dfrac{\mathbb P({sick})}{\mathbb P(healthy)} \times \dfrac{\mathbb P({blackened}\mid {sick})}{\mathbb P({blackened}\mid healthy)} = \dfrac{\mathbb P({sick}\mid {blackened})}{\mathbb P(healthy\mid {blackened})}.$$

todo: ap­par­ently the par­allel is not su­per ob­vi­ous, and maybe we can use slightly differ­ent col­ors in the text to make it clearer that e.g. Prior odds → sick/​healthy

The prior odds re­fer to the rel­a­tive pro­por­tion of sick vs healthy pa­tients, which is $$1 : 4$$. Con­vert­ing these odds into prob­a­bil­ities gives us $$\mathbb P(sick)=\frac{1}{4+1}=\frac{1}{5}=20\%$$.

The rel­a­tive like­li­hood refers to how much more likely each sick pa­tient is to get a pos­i­tive test re­sult than each healthy pa­tient, which (us­ing con­di­tional prob­a­bil­ity no­ta­tion) is $$\frac{\mathbb P(positive \mid sick)}{\mathbb P(positive \mid healthy)}=\frac{0.90}{0.30},$$ aka rel­a­tive like­li­hoods of $$3 : 1.$$

The pos­te­rior odds are the rel­a­tive pro­por­tions of sick vs healthy pa­tients among those with pos­i­tive test re­sults, or $$\frac{\mathbb P(sick \mid positive)}{\mathbb P(healthy \mid positive)} = \frac{3}{4}$$, aka $$3 : 4$$ odds.

To ex­tract the prob­a­bil­ity from the rel­a­tive odds, we keep in mind that prob­a­bil­ities of mu­tu­ally ex­clu­sive and ex­haus­tive propo­si­tions need to sum to $$1,$$ that is, there is a 100% prob­a­bil­ity of some­thing hap­pen­ing. Since ev­ery­one is ei­ther sick or not sick, we can nor­mal­ize the odd ra­tio $$3 : 4$$ by di­vid­ing through by the sum of terms:

$$(\frac{3}{3+4} : \frac{4}{3+4}) = (\frac{3}{7} : \frac{4}{7}) \approx (0.43 : 0.57)$$

…end­ing up with the prob­a­bil­ities (0.43 : 0.57), pro­por­tional to the origi­nal ra­tio of (3 : 4), but sum­ming to 1. It would be very odd if some­thing had prob­a­bil­ity $$3$$ (300% prob­a­bil­ity) of hap­pen­ing.

Us­ing the wa­ter­fall vi­su­al­iza­tion:

We can gen­er­al­ize this to any two hy­pothe­ses $$H_j$$ and $$H_k$$ with ev­i­dence $$e$$, in which case Bayes’ rule can be writ­ten as:

$$\dfrac{\mathbb P(H_j)}{\mathbb P(H_k)} \times \dfrac{\mathbb P(e \mid H_j)}{\mathbb P(e \mid H_k)} = \dfrac{\mathbb P(H_j \mid e)}{\mathbb P(H_k \mid e)}$$

which says “the pos­te­rior odds ra­tio for hy­pothe­ses $$H_j$$ vs $$H_k$$ (af­ter see­ing the ev­i­dence $$e$$) are equal to the prior odds ra­tio times the ra­tio of how well $$H_j$$ pre­dicted the ev­i­dence com­pared to $$H_k.$$

If $$H_j$$ and $$H_k$$ are mu­tu­ally ex­clu­sive and ex­haus­tive, we can con­vert the pos­te­rior odds into a pos­te­rior prob­a­bil­ity for $$H_j$$ by nor­mal­iz­ing the odds—di­vid­ing through the odds ra­tio by the sum of its terms, so that the el­e­ments of the new ra­tio sum to $$1.$$

## Proof of Bayes’ rule

Rear­rang­ing the defi­ni­tion of con­di­tional prob­a­bil­ity, $$\mathbb P(X \wedge Y) = \mathbb P(Y) \cdot \mathbb P(X|Y).$$ E.g. to find “the frac­tion of all pa­tients that are sick and get a pos­i­tive re­sult”, we mul­ti­ply “the frac­tion of pa­tients that are sick” times “the prob­a­bil­ity that a sick pa­tient black­ens the tongue de­pres­sor”.

Then this is a proof of Bayes’ rule:

$$\frac{\mathbb P(H_j)}{\mathbb P(H_k)} \cdot \frac{\mathbb P(e_0 | H_j)}{\mathbb P(e_0 | H_k)} = \frac{\mathbb P(e_0 \wedge H_j)}{\mathbb P(e_0 \wedge H_k)} = \frac{\mathbb P(H_j \wedge e_0)/\mathbb P(e_0)}{\mathbb P(H_k \wedge e_0)/\mathbb P(e_0)} = \frac{\mathbb P(H_j | e_0)}{\mathbb P(H_k | e_0)}$$

QED.

In the Dise­a­sitis ex­am­ple, these proof steps cor­re­spond to the op­er­a­tions:

$$\frac{0.20}{0.80} \cdot \frac{0.90}{0.30} = \frac{0.18}{0.24} = \frac{0.18/0.42}{0.24/0.42} = \frac{0.43}{0.57}$$

Us­ing red for sick, blue for healthy, grey for a mix of sick and healthy pa­tients, and + signs for pos­i­tive test re­sults, the calcu­la­tion steps can be vi­su­al­ized as fol­lows:

todo: maybe re­place this di­a­gram with pie-chart cir­cles in ex­actly right pro­por­tions (but still with the cor­rect pop­u­la­tions of + signs)

This pro­cess of ob­serv­ing ev­i­dence and us­ing its like­li­hood ra­tio to trans­form a prior be­lief into a pos­te­rior be­lief is called a “Bayesian up­date” or “be­lief re­vi­sion.”

Con­grat­u­la­tions! You now know (we hope) what Bayes’ rule is, and how to ap­ply it to sim­ple se­tups. After this, the path con­tinues with fur­ther im­pli­ca­tions %if-be­fore(Bayes’ rule: Vec­tor form): and ad­di­tional forms% of Bayes’ rule. This might be a good time to take a break, if you want one—but we hope you con­tinue on this Ar­bital path af­ter that!

• For the gen­er­al­iza­tion of the odds form of Bayes’ rule to mul­ti­ple hy­pothe­ses and mul­ti­ple items of ev­i­dence, see Bayes’ rule: Vec­tor form.

• For a trans­for­ma­tion of the odds form that makes the strength of ev­i­dence even more di­rectly visi­ble, see Bayes’ rule: Log-odds form. <div>

Parents:

• Bayes' rule: Odds form

The sim­plest and most eas­ily un­der­stand­able form of Bayes’ rule uses rel­a­tive odds.

• Which calcu­la­tion?

• I seem to have bro­ken the dis­play by propos­ing an edit! The meta-level script is show­ing in some places. I hope that doesn’t cause un­needed headaches.

I only wanted to em­pha­size the differ­ence in no­ta­tion be­tween a hori­zon­tal line (“—”, as in rel­a­tive prob­a­bil­ities) vs. a for­ward slash (”/​”, as in prob­a­bil­ity that some­thing will oc­cur). I could find no mi­suse of the no­ta­tion when I re-read the page, but it was a bit con­fus­ing for me jump­ing in at the stage I did (is there an ear­lier page briefly defin­ing var­i­ous no­ta­tion?), since I am ac­cus­tomed to both these sym­bols mean­ing “di­vided by”, which lead me to in­stinc­tively calcu­late a per­centage or frac­tion when­ever I see one or the other. This could be an idiosyn­crasy of an en­g­ineer­ing work­place.

• The fol­low­ing would be sim­pler and more con­sis­tent with the be­gin­ning of the sen­tence: “the frac­tion of sick pa­tients that got a pos­i­tive re­sult”

• “got” would be clearer.

• This con­fused me at first be­cause I didn’t re­al­ize it was sar­casm and I thought I was miss­ing some­thing. “Is there any rea­son why dis­t­in­guish­ing be­tween as­sump­tion and propo­si­tion is a bad idea?”

• Just re­it­er­at­ing that it’s 18% of all stu­dents (sick and healthy). That’s be­cause it’s a 90% (0.9) chance the black­ened tongue de­pres­sor be­longed to a sick stu­dent, out of all the sick stu­dents (20% of to­tal stu­dent pop­u­la­tion).

Sorry if this is re­ally ob­vi­ous to oth­ers, it just took me a while.

• It was un­clear when read­ing this which test “this test” referred to. I ended up figur­ing out the false nega­tives and true pos­i­tives of the 6020 test in­stead of the 9030 test and was sub­se­quently con­fused be­cause 1:2 != 1:7. This might be an is­sue with my read­ing com­pre­hen­sion, but I figured I should men­tion it any­way.

• Would that mean that the strength of ev­i­dence is the TP/​FP ra­tio ? in that case, it would have the same defi­ni­tion as the rel­a­tive like­li­hood. Wouldn’t there be a bet­ter defi­ni­tion for ei­ther one of the no­tions so that we can eas­ily differ­en­ti­ate them ?

• I’d re­ally like to see links to prob­lems or sums at each level, i feel like a sin­gle or two worked out ex­am­ples is not enough, and that say ten prob­lems that help one think this idea and con­nected ideas through would be great.

• In this page, the terms “prob­a­bil­ity” and “odds” are used in the statis­ti­cal sense of “In the clas­si­cal and canon­i­cal rep­re­sen­ta­tion of prob­a­bil­ity, 0 ex­presses ab­solute in­cre­dulity, and 1 ex­presses ab­solute cre­dulity.” (from the linked defi­ni­tion) and “odds are a ra­tio of de­sired out­comes vs the field” (has no linked defi­ni­tion, I’m just wildly guess­ing based on con­text).

Ex­plain­ing this dis­tinc­tion clearly at the out­set for non-statis­ti­cally trained users, may be worth­while.