High-speed intro to Bayes's rule

(This is a high-speed in­tro­duc­tion to Bayes’ rule for peo­ple who want to get straight to it and are good at math. If you’d like a gen­tler or more thor­ough in­tro­duc­tion, try start­ing at the Bayes’ Rule Guide page in­stead.)

Per­centages, fre­quen­cies, and waterfalls

Sup­pose you’re screen­ing a set of pa­tients for a dis­ease, which we’ll call Dise­a­sitis.noteLit. “in­flam­ma­tion of the dis­ease”. Your ini­tial test is a tongue de­pres­sor con­tain­ing a chem­i­cal strip, which usu­ally turns black if the pa­tient has Dise­a­sitis.

  • Based on prior epi­demiol­ogy, you ex­pect that around 20% of pa­tients in the screen­ing pop­u­la­tion have Dise­a­sitis.

  • Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black.

  • 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black.

What frac­tion of pa­tients with black tongue de­pres­sors have Dise­a­sitis?

37 or 43%, quickly ob­tain­able as fol­lows: In the screened pop­u­la­tion, there’s 1 sick pa­tient for 4 healthy pa­tients. Sick pa­tients are 3 times more likely to turn the tongue de­pres­sor black than healthy pa­tients. \((1 : 4) \cdot (3 : 1) = (3 : 4)\) or 3 sick pa­tients to 4 healthy pa­tients among those that turn the tongue de­pres­sor black, cor­re­spond­ing to a prob­a­bil­ity of \(3/7 = 43\%\) that the pa­tient is sick.

(Take your own stab at an­swer­ing this ques­tion, then please click “An­swer” above to read the an­swer be­fore con­tin­u­ing.)

Bayes’ rule is a the­o­rem which de­scribes the gen­eral form of the op­er­a­tion we car­ried out to find the an­swer above. In the form we used above, we:

  • Started from the prior odds of (1 : 4) for sick ver­sus healthy pa­tients;

  • Mul­ti­plied by the like­li­hood ra­tio of (3 : 1) for sick ver­sus healthy pa­tients black­en­ing the tongue de­pres­sor;

  • Ar­rived at pos­te­rior odds of (3 : 4) for a pa­tient with a pos­i­tive test re­sult be­ing sick ver­sus healthy.

Bayes’ rule in this form thus states that the prior odds times the like­li­hood ra­tio equals the pos­te­rior odds.

We could also po­ten­tially see the pos­i­tive test re­sult as re­vis­ing a prior be­lief or prior prob­a­bil­ity of 20% that the pa­tient was sick, to a pos­te­rior be­lief or pos­te­rior prob­a­bil­ity of 43%.

To make it clearer that we did the cor­rect calcu­la­tion above, and fur­ther pump in­tu­itions for Bayes’ rule, we’ll walk through some ad­di­tional vi­su­al­iza­tions.

Fre­quency representation

The fre­quency rep­re­sen­ta­tion of Bayes’ rule would de­scribe the prob­lem as fol­lows: “Among 100 pa­tients, there will be 20 sick pa­tients and 80 healthy pa­tients.”

prior frequency

“18 out of 20 sick pa­tients will turn the tongue de­pres­sor black. 24 out of 80 healthy pa­tients will blacken the tongue de­pres­sor.”

posterior frequency

“There­fore, there are (18+24)=42 pa­tients who turn the tongue de­pres­sor black, among whom 18 are ac­tu­ally sick. (18/​42)=(3/​7)=43%.”

(Some ex­per­i­ments show noteE.g. “Prob­a­bil­is­tic rea­son­ing in clini­cal medicine” by David M. Eddy (1982). that this way of ex­plain­ing the prob­lem is the eas­iest for e.g. med­i­cal stu­dents to un­der­stand, so you may want to re­mem­ber this for­mat for fu­ture use. As­sum­ing you can’t just send them to Ar­bital!)

Water­fall representation

The wa­ter­fall rep­re­sen­ta­tion may make clearer why we’re also al­lowed to trans­form the prob­lem into prior odds and a like­li­hood ra­tio, and mul­ti­ply (1 : 4) by (3 : 1) to get pos­te­rior odds of (3 : 4) and a prob­a­bil­ity of 37.

The fol­low­ing prob­lem is iso­mor­phic to the Dise­a­sitis one:

“A wa­ter­fall has two streams of wa­ter at the top, a red stream and a blue stream. Th­ese streams flow down the wa­ter­fall, with some of each stream be­ing di­verted off to the side, and the re­main­der pools at the bot­tom of the wa­ter­fall.”

unlabeled waterfall

“At the top of the wa­ter­fall, there’s around 20 gal­lons/​sec­ond flow­ing from the red stream, and 80 gal­lons/​sec­ond flow­ing from the blue stream. 90% of the red wa­ter makes it to the bot­tom of the wa­ter­fall, and 30% of the blue wa­ter makes it to the bot­tom of the wa­ter­fall. Of the pur­plish wa­ter that mixes at the bot­tom, what frac­tion is from the red stream ver­sus the blue stream?”

labeled waterfall

We can see from star­ing at the di­a­gram that the prior odds and like­li­hood ra­tio are the only num­bers we need to ar­rive at the an­swer:

  • The prob­lem would have the same an­swer if there were 40 gal­lons/​sec of red wa­ter and 160 gal­lons/​sec of blue wa­ter (in­stead of 20 gal­lons/​sec and 80 gal­lons/​sec). This would just mul­ti­ply the to­tal amount of wa­ter by a fac­tor of 2, with­out chang­ing the ra­tio of red to blue at the bot­tom.

  • The prob­lem would also have the same an­swer if 45% of the red stream and 15% of the blue stream made it to the bot­tom (in­stead of 90% and 30%). This would just cut down the to­tal amount of wa­ter by a fac­tor of 2, with­out chang­ing the rel­a­tive pro­por­tions of red and blue wa­ter.

wide vs narrow waterfall

So only the ra­tio of red to blue wa­ter at the top (prior odds of the propo­si­tion), and only the ra­tio be­tween the per­centages of red and blue wa­ter that make it to the bot­tom (like­li­hood ra­tio of the ev­i­dence), to­gether de­ter­mine the pos­te­rior ra­tio at the bot­tom: 3 parts red to 4 parts blue.

Test problem

Here’s an­other Bayesian prob­lem to at­tempt. If you suc­cess­fully solved the ear­lier prob­lem on your first try, you might try do­ing this one in your head.

10% of wid­gets are bad and 90% are good. 4% of good wid­gets emit sparks, and 12% of bad wid­gets emit sparks. What per­centage of spark­ing wid­gets are bad?

  • There’s \(1 : 9\) bad vs. good wid­gets. (9 times as many good wid­gets as bad wid­gets; wid­gets are 19 as likely to be bad as good.)

  • Bad vs. good wid­gets have a \(12 : 4\) rel­a­tive like­li­hood to spark, which sim­plifies to \(3 : 1.\) (Bad wid­gets are 3 times as likely to emit sparks as good wid­gets.)

  • \((1 : 9) \cdot (3 : 1) = (3 : 9) \cong (1 : 3).\) (1 bad spark­ing wid­get for ev­ery 3 good spark­ing wid­gets.)

  • Odds of \(1 : 3\) con­vert to a prob­a­bil­ity of \(\frac{1}{1+3} = \frac{1}{4} = 25\%.\) (25% of spark­ing wid­gets are bad.) <div><div>

(If you’re hav­ing trou­ble us­ing odds ra­tios to rep­re­sent un­cer­tainty, see this in­tro or this page.)

Gen­eral equa­tion and proof

To say ex­actly what we’re do­ing and prove its val­idity, we need to in­tro­duce some no­ta­tion from prob­a­bil­ity the­ory.

If \(X\) is a propo­si­tion, \(\mathbb P(X)\) will de­note \(X\)’s prob­a­bil­ity, our quan­ti­ta­tive de­gree of be­lief in \(X.\)

\(\neg X\) will de­note the nega­tion of \(X\) or the propo­si­tion ”\(X\) is false”.

If \(X\) and \(Y\) are propo­si­tions, then \(X \wedge Y\) de­notes the propo­si­tion that both X and Y are true. Thus \(\mathbb P(X \wedge Y)\) de­notes “The prob­a­bil­ity that \(X\) and \(Y\) are both true.”

We now define con­di­tional prob­a­bil­ity:

$$\mathbb P(X|Y) := \dfrac{\mathbb P(X \wedge Y)}{\mathbb P(Y)} \tag*{(definition of conditional probability)}$$

We pro­nounce \(\mathbb P(X|Y)\) as “the con­di­tional prob­a­bil­ity of X, given Y”. In­tu­itively, this is sup­posed to mean “The prob­a­bil­ity that \(X\) is true, as­sum­ing that propo­si­tion \(Y\) is true”.

Defin­ing con­di­tional prob­a­bil­ity in this way means that to get “the prob­a­bil­ity that a pa­tient is sick, given that they turned the tongue de­pres­sor black” we should put all the sick plus healthy pa­tients with pos­i­tive test re­sults into a bag, and ask about the prob­a­bil­ity of draw­ing a pa­tient who is sick and got a pos­i­tive test re­sult from that bag. In other words, we perform the calcu­la­tion \(\frac{18}{18+24} = \frac{3}{7}.\)

diseasitis frequency

Rear­rang­ing the defi­ni­tion of con­di­tional prob­a­bil­ity, \(\mathbb P(X \wedge Y) = \mathbb P(Y) \cdot \mathbb P(X|Y).\) So to find “the frac­tion of all pa­tients that are sick and get a pos­i­tive re­sult”, we mul­ti­ply “the frac­tion of pa­tients that are sick” times “the prob­a­bil­ity that a sick pa­tient black­ens the tongue de­pres­sor”.

We’re now ready to prove Bayes’s rule in the form, “the prior odds times the like­li­hood ra­tio equals the pos­te­rior odds”.

The “prior odds” is the ra­tio of sick to healthy pa­tients:

$$\frac{\mathbb P(sick)}{\mathbb P(healthy)} \tag*{(prior odds)}$$

The “like­li­hood ra­tio” is how much more rel­a­tively likely a sick pa­tient is to get a pos­i­tive test re­sult (turn the tongue de­pres­sor black), com­pared to a healthy pa­tient:

$$\frac{\mathbb P(positive | sick)}{\mathbb P(positive | healthy)} \tag*{(likelihood ratio)}$$

The “pos­te­rior odds” is the odds that a pa­tient is sick ver­sus healthy, given that they got a pos­i­tive test re­sult:

$$\frac{\mathbb P(sick | positive)}{\mathbb P(healthy | positive)} \tag*{(posterior odds)}$$

Bayes’s the­o­rem as­serts that prior odds times like­li­hood ra­tio equals pos­te­rior odds:

$$\frac{\mathbb P(sick)}{\mathbb P(healthy)} \cdot \frac{\mathbb P(positive | sick)}{\mathbb P(positive | healthy)} = \frac{\mathbb P(sick | positive)}{\mathbb P(healthy | positive)}$$

We will show this by prov­ing the gen­eral form of Bayes’s Rule. For any two hy­pothe­ses \(H_j\) and \(H_k\) and any piece of new ev­i­dence \(e_0\):

$$ \frac{\mathbb P(H_j)}{\mathbb P(H_k)} \cdot \frac{\mathbb P(e_0 | H_j)}{\mathbb P(e_0 | H_k)} = \frac{\mathbb P(e_0 \wedge H_j)}{\mathbb P(e_0 \wedge H_k)} = \frac{\mathbb P(e_0 \wedge H_j)/\mathbb P(e_0)}{\mathbb P(e_0 \wedge H_k)/\mathbb P(e_0)} = \frac{\mathbb P(H_j | e_0)}{\mathbb P(H_k | e_0)} $$

In the Dise­a­sitis ex­am­ple, this cor­re­sponds to perform­ing the op­er­a­tions:

$$ \frac{0.20}{0.80} \cdot \frac{0.90}{0.30} = \frac{0.18}{0.24} = \frac{0.18/0.42}{0.24/0.42} = \frac{0.43}{0.57} $$

Us­ing red for sick, blue for healthy, grey for a mix of sick and healthy pa­tients, and + signs for pos­i­tive test re­sults, the proof above can be vi­su­al­ized as fol­lows:

bayes venn

todo: less red in first cir­cle (top left). in gen­eral, don’t have prior pro­por­tions equal pos­te­rior pro­por­tions graph­i­cally!

Bayes’ theorem

An al­ter­na­tive form, some­times called “Bayes’ the­o­rem” to dis­t­in­guish it from “Bayes’ rule” (al­though not ev­ery­one fol­lows this con­ven­tion), uses ab­solute prob­a­bil­ities in­stead of ra­tios. The law of marginal prob­a­bil­ity states that for any set of mu­tu­ally ex­clu­sive and ex­haus­tive pos­si­bil­ities \(\{X_1, X_2, ..., X_i\}\) and any propo­si­tion \(Y\):

$$\mathbb P(Y) = \sum_i \mathbb P(Y \wedge X_i) \tag*{(law of marginal probability)}$$

Then we can de­rive an ex­pres­sion for the ab­solute (non-rel­a­tive) prob­a­bil­ity of a propo­si­tion \(H_k\) af­ter ob­serv­ing ev­i­dence \(e_0\) as fol­lows:

$$ \mathbb P(H_k | e_0) = \frac{\mathbb P(H_k \wedge e_0)}{\mathbb P(e_0)} = \frac{\mathbb P(e_0 \wedge H_k)}{\sum_i P(e_0 \wedge H_i)} = \frac{\mathbb P(e_0 | X_k) \cdot \mathbb P(X_k)}{\sum_i \mathbb P(e_0 | X_i) \cdot \mathbb P(X_i)} $$

The equa­tion of the first and last terms above is what you will usu­ally see de­scribed as Bayes’ the­o­rem.

To see why this de­com­po­si­tion might be use­ful, note that \(\mathbb P(sick | positive)\) is an in­fer­en­tial step, a con­clu­sion that we make af­ter ob­serv­ing a new piece of ev­i­dence. \(\mathbb P(positive | sick)\) is a piece of causal in­for­ma­tion we are likely to have on hand, for ex­am­ple by test­ing groups of sick pa­tients to see how many of them turn the tongue de­pres­sor black. \(\mathbb P(sick)\) de­scribes our state of be­lief be­fore mak­ing any new ob­ser­va­tions. So Bayes’ the­o­rem can be seen as tak­ing what we already be­lieve about the world (in­clud­ing our prior be­lief about how differ­ent imag­in­able states of af­fairs would gen­er­ate differ­ent ob­ser­va­tions), plus an ac­tual ob­ser­va­tion, and out­putting a new state of be­lief about the world.

Vec­tor and func­tional generalizations

Since the proof of Bayes’ rule holds for any pair of hy­pothe­ses, it also holds for rel­a­tive be­lief in any num­ber of hy­pothe­ses. Fur­ther­more, we can re­peat­edly mul­ti­ply by like­li­hood ra­tios to chain to­gether any num­ber of pieces of ev­i­dence.

Sup­pose there’s a bath­tub full of coins:

  • Half the coins are “fair” and have a 50% prob­a­bil­ity of com­ing up Heads each time they are thrown.

  • A third of the coins are bi­ased to pro­duce Heads 25% of the time (Tails 75%).

  • The re­main­ing sixth of the coins are bi­ased to pro­duce Heads 75% of the time.

You ran­domly draw a coin, flip it three times, and get the re­sult HTH. What’s the chance this is a fair coin?

We can val­idly calcu­late the an­swer as fol­lows:

$$ \begin{array}{rll} & (3 : 2 : 1) & \cong (\frac{1}{2} : \frac{1}{3} : \frac{1}{6}) \\ \times & (2 : 1 : 3) & \cong ( \frac{1}{2} : \frac{1}{4} : \frac{3}{4} ) \\ \times & (2 : 3 : 1) & \cong ( \frac{1}{2} : \frac{3}{4} : \frac{1}{4} ) \\ \times & (2 : 1 : 3) & \\ = & (24 : 6 : 9) & \cong (8 : 2 : 3) \cong (\frac{8}{13} : \frac{2}{13} : \frac{3}{13}) \end{array} $$

So the pos­te­rior prob­a­bil­ity the coin is fair is 813 or ~62%.

This is one rea­son it’s good to know the odds form of Bayes’ rule, not just the prob­a­bil­ity form in which Bayes’ the­o­rem is of­ten given.noteI­mag­ine try­ing to do the above calcu­la­tion by re­peat­edly ap­ply­ing the form of the the­o­rem that says:

$$\mathbb P(H_k | e_0) = \frac{\mathbb P(e_0 | X_k) \cdot \mathbb P(X_k)}{\sum_i \mathbb P(e_o | X_i) \cdot \mathbb P(X_i)}$$

We can gen­er­al­ize fur­ther by writ­ing Bayes’ rule in a func­tional form. If \(\mathbb O(H_i)\) is a rel­a­tive be­lief vec­tor or rel­a­tive be­lief func­tion on the vari­able \(H,\) and \(\mathcal L(e_0 | H_i)\) is the like­li­hood func­tion giv­ing the rel­a­tive chance of ob­serv­ing ev­i­dence \(e_0\) given each pos­si­ble state of af­fairs \(H_i,\) then rel­a­tive pos­te­rior be­lief \(\mathbb O(H_i | e_0)\) is given by:

$$\mathbb O(H_i | e_0) = \mathcal L(e_0 | H_i) \cdot \mathbb O(H_i)$$

If we nor­mal­ize the rel­a­tive odds \(\mathbb O\) into ab­solute prob­a­bil­ities \(\mathbb P\) - that is, di­vide through \(\mathbb O\) by its sum or in­te­gral so that the new func­tion sums or in­te­grates to \(1\) - then we ob­tain Bayes’ rule for prob­a­bil­ity func­tions:

$$\mathbb P(H_i | e_0) \propto \mathcal L(e_0 | H_i) \cdot \mathbb P(H_i) \tag*{(functional form of Bayes' rule)}$$

Ap­pli­ca­tions of Bayesian reasoning

This gen­eral Bayesian frame­work—prior be­lief, ev­i­dence, pos­te­rior be­lief—is a lens through which we can view a lot of for­mal and in­for­mal rea­son­ing plus a large amount of en­tirely non­ver­bal cog­ni­tive-ish phe­nom­ena.noteThis broad state­ment is widely agreed. Ex­actly which phe­nom­ena are good to view through a Bayesian lens is some­times dis­puted.

Ex­am­ples of peo­ple who might want to study Bayesian rea­son­ing in­clude:

  • Pro­fes­sion­als who use statis­tics, such as sci­en­tists or med­i­cal doc­tors.

  • Com­puter pro­gram­mers work­ing in the field of ma­chine learn­ing.

  • Hu­man be­ings try­ing to think.

The third ap­pli­ca­tion is prob­a­bly of the widest gen­eral in­ter­est.

Ex­am­ple hu­man ap­pli­ca­tions of Bayesian reasoning

Philip Tet­lock found when study­ing “su­perfore­cast­ers”, peo­ple who were es­pe­cially good at pre­dict­ing fu­ture events:

“The su­perfore­cast­ers are a nu­mer­ate bunch: many know about Bayes’ the­o­rem and could de­ploy it if they felt it was worth the trou­ble. But they rarely crunch the num­bers so ex­plic­itly. What mat­ters far more to the su­perfore­cast­ers than Bayes’ the­o­rem is Bayes’ core in­sight of grad­u­ally get­ting closer to the truth by con­stantly up­dat­ing in pro­por­tion to the weight of the ev­i­dence.” — Philip Tet­lock and Dan Gard­ner, Superforecasting

This is some ev­i­dence that know­ing about Bayes’ rule and un­der­stand­ing its qual­i­ta­tive im­pli­ca­tions is a fac­tor in de­liv­er­ing bet­ter-than-av­er­age in­tu­itive hu­man rea­son­ing. This pat­tern is illus­trated in the next cou­ple of ex­am­ples.

The OKCupid date.

One re­al­is­tic ex­am­ple of Bayesian rea­son­ing was de­ployed by one of the early test vol­un­teers for a much ear­lier ver­sion of a guide to Bayes’ rule. She had sched­uled a date with a 96% OKCupid match, who had then can­cel­led that date with­out other ex­pla­na­tion. After spend­ing some men­tal time bounc­ing back and forth be­tween “that doesn’t seem like a good sign” ver­sus “maybe there was a good rea­son he can­celed”, she de­cided to try look­ing at the prob­lem us­ing that Bayes thing she’d just learned about. She es­ti­mated:

  • A 96% OKCupid match like this one, had prior odds of 2 : 5 for be­ing a de­sir­able ver­sus un­de­sir­able date. (Based on her prior ex­pe­rience with 96% OKCupid matches, and the de­tails of his pro­file.)

  • Men she doesn’t want to go out with are 3 times as likely as men she might want to go out with to can­cel a first date with­out other ex­pla­na­tion.

This im­plied pos­te­rior odds of 2 : 15 that this was an un­de­sir­able date, which was un­fa­vor­able enough not to pur­sue him fur­ther.noteShe sent him what might very well have been the first ex­plic­itly Bayesian re­jec­tion no­tice in dat­ing his­tory, rea­son­ing that if he wrote back with a Bayesian coun­ter­ar­gu­ment, this would pro­mote him to be­ing in­ter­est­ing again. He didn’t write back.

The point of look­ing at the prob­lem this way is not that she knew ex­act prob­a­bil­ities and could calcu­late that the man had an ex­actly 88% chance of be­ing un­de­sir­able. Rather, by break­ing up the prob­lem in that way, she was able to sum­ma­rize what she thought she knew in com­pact form, see what those be­liefs already im­plied, and stop bounc­ing back and forth be­tween imag­ined rea­sons why a good date might can­cel ver­sus rea­sons to pro­tect her­self from po­ten­tial bad dates. An an­swer roughly in the range of 1517 made the de­ci­sion clear.

In­tern­ment of Ja­panese-Amer­i­cans dur­ing World War II

From Robyn Dawes’s Ra­tional Choice in an Uncer­tain World:

Post-hoc fit­ting of ev­i­dence to hy­poth­e­sis was in­volved in a most grievous chap­ter in United States his­tory: the in­tern­ment of Ja­panese-Amer­i­cans at the be­gin­ning of the Se­cond World War. When Cal­ifor­nia gov­er­nor Earl War­ren tes­tified be­fore a con­gres­sional hear­ing in San Fran­cisco on Fe­bru­ary 21, 1942, a ques­tioner pointed out that there had been no sab­o­tage or any other type of es­pi­onage by the Ja­panese-Amer­i­cans up to that time. War­ren re­sponded, “I take the view that this lack sub­ver­sive ac­tivity is the most om­i­nous sign in our whole situ­a­tion. It con­vinces me more than per­haps any other fac­tor that the sab­o­tage we are to get, the Fifth Column ac­tivi­ties are to get, are timed just like Pearl Har­bor was timed… I be­lieve we are just be­ing lul­led into a false sense of se­cu­rity.”

You might want to take your own shot at guess­ing what Dawes had to say about a Bayesian view of this situ­a­tion, be­fore read­ing fur­ther.

Sup­pose we put our­selves into the shoes of this con­gres­sional hear­ing, and imag­ine our­selves try­ing to set up this prob­lem.

  • The prior odds that there would be a con­spir­acy of Ja­panese-Amer­i­can sabo­teurs.

  • The like­li­hood of the ob­ser­va­tion “no visi­ble sab­o­tage or any other type of es­pi­onage”, given that a Fifth Column ac­tu­ally ex­isted.

  • The like­li­hood of the ob­ser­va­tion “no visi­ble sab­o­tage from Ja­panese-Amer­i­cans”, in the pos­si­ble world where there is no such con­spir­acy.

As soon as we set up this prob­lem, we re­al­ize that, what­ever the prob­a­bil­ity of “no sab­o­tage” be­ing ob­served if there is a con­spir­acy, the like­li­hood of ob­serv­ing “no sab­o­tage” if there isn’t a con­spir­acy must be even higher. This means that the like­li­hood ra­tio:

$$\frac{\mathbb P(\neg \text{sabotage} | \text {conspiracy})}{\mathbb P(\neg \text {sabotage} | \neg \text {conspiracy})}$$

…must be less than 1, and ac­cord­ingly:

$$ \frac{\mathbb P(\text {conspiracy} | \neg \text{sabotage})}{\mathbb P(\neg \text {conspiracy} | \neg \text{sabotage})} < \frac{\mathbb P(\text {conspiracy})}{\mathbb P(\neg \text {conspiracy})} \cdot \frac{\mathbb P(\neg \text{sabotage} | \text {conspiracy})}{\mathbb P(\neg \text {sabotage} | \neg \text {conspiracy})} $$

Ob­serv­ing the to­tal ab­sence of any sab­o­tage can only de­crease our es­ti­mate that there’s a Ja­panese-Amer­i­can Fifth Column, not in­crease it. (It definitely shouldn’t be “the most om­i­nous” sign that con­vinces us “more than any other fac­tor” that the Fifth Column ex­ists.)

Again, what mat­ters is not the ex­act like­li­hood of ob­serv­ing no sab­o­tage given that a Fifth Column ac­tu­ally ex­ists. As soon as we set up the Bayesian prob­lem, we can see there’s some­thing qual­i­ta­tively wrong with Earl War­ren’s rea­son­ing.<div><div>

Fur­ther reading

This has been a very brief and high-speed pre­sen­ta­tion of Bayes and Bayesi­anism. It should go with­out say­ing that a vast liter­a­ture, nay, a uni­verse of liter­a­ture, ex­ists on Bayesian statis­ti­cal meth­ods and Bayesian episte­mol­ogy and Bayesian al­gorithms in ma­chine learn­ing. Stay­ing in­side Ar­bital, you might be in­ter­ested in mov­ing on to read:

More on the tech­ni­cal side of Bayes’ rule

More on in­tu­itive im­pli­ca­tions of Bayes’ rule


  • Bayes' rule

    Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.

    • Bayesian reasoning

      A prob­a­bil­ity-the­ory-based view of the world; a co­her­ent way of chang­ing prob­a­bil­is­tic be­liefs based on ev­i­dence.