# Bayes' rule: Log-odds form

The odds form of Bayes’s Rule states that the prior odds times the like­li­hood ra­tio equals the pos­te­rior odds. We can take the log of both sides of this equa­tion, yield­ing an equiv­a­lent equa­tion which uses ad­di­tion in­stead of mul­ti­pli­ca­tion.

Let­ting $$H_i$$ and $$H_j$$ de­note hy­pothe­ses and $$e$$ de­note ev­i­dence, the log-odds form of Bayes’ rule states:

$$\log \left ( \dfrac {\mathbb P(H_i\mid e)} {\mathbb P(H_j\mid e)} \right ) = \log \left ( \dfrac {\mathbb P(H_i)} {\mathbb P(H_j)} \right ) + \log \left ( \dfrac {\mathbb P(e\mid H_i)} {\mathbb P(e\mid H_j)} \right ).$$

This can be nu­mer­i­cally effi­cient for when you’re car­ry­ing out lots of up­dates one af­ter an­other. But a more im­por­tant rea­son to think in log odds is to get a bet­ter grasp on the no­tion of ‘strength of ev­i­dence’.

# Log­a­r­ithms of like­li­hood ratios

Sup­pose you’re vis­it­ing your friends An­drew and Betty, who are a cou­ple. They promised that one of them would pick you up from the air­port when you ar­rive. You’re not sure which one is in fact go­ing to pick you up (prior odds of 50:50), but you do know three things:

1. They have both a blue car and a red car. An­drew prefers to drive the blue car, Betty prefers to drive the red car, but the cor­re­la­tion is rel­a­tively weak. (Some­times, which car they drive de­pends on which one their child is us­ing.) An­drew is 2x as likely to drive the blue car as Betty.

2. Betty tends to honk the horn at you to get your at­ten­tion. An­drew does this too, but less of­ten. Betty is 4x as likely to honk as An­drew.

3. An­drew tends to run a lit­tle late (more of­ten than Betty). Betty is 2x as likely to have the car already at the air­port when you ar­rive.

All three ob­ser­va­tions are in­de­pen­dent as far as you know (that is, you don’t think Betty’s any more or less likely to be late if she’s driv­ing the blue car, and so on).

Let’s say we see a blue car, already at the air­port, which honks.

The odds form of this calcu­la­tion would be a $$(1 : 1)$$ prior for Betty vs. An­drew, times like­li­hood ra­tios of $$(1 : 2) \times (4 : 1) \times (2 : 1),$$ yield­ing pos­te­rior odds of $$(1 \times 4 \times 2 : 2 \times 1 \times 1) = (8 : 2) = (4 : 1)$$, so it’s 45 = 80% likely to be Betty.

Here’s the log odds form of the same calcu­la­tion, us­ing 1 bit to de­note each fac­tor of $$2$$ in be­lief or ev­i­dence:

• Prior be­lief in Betty of $$\log_2 (\frac{1}{1}) = 0$$ bits.

• Ev­i­dence of $$\log_2 (\frac{1}{2}) = {-1}$$ bits against Betty.

• Ev­i­dence of $$\log_2 (\frac{4}{1}) = {+2}$$ bits for Betty.

• Ev­i­dence of $$\log_2 (\frac{2}{1}) = {+1}$$ bit for Betty.

• Pos­te­rior be­lief of $$0 + {^-1} + {^+2} + {^+1} = {^+2}$$ bits that Betty is pick­ing us up.

If your pos­te­rior be­lief is +2 bits, then your pos­te­rior odds are $$(2^{+2} : 1) = (4 : 1),$$ yield­ing a pos­te­rior prob­a­bil­ity of 80% that Betty is pick­ing you up.

Ev­i­dence and be­lief rep­re­sented this way is ad­di­tive, which can make it an eas­ier fit for in­tu­itions about “strength of cre­dence” and “strength of ev­i­dence”; we’ll soon de­velop this point in fur­ther depth.

# The log-odds line

Imag­ine you start out think­ing that the hy­poth­e­sis $$H$$ is just as likely as $$\lnot H,$$ its nega­tion. Then you get five sep­a­rate in­de­pen­dent $$2 : 1$$ up­dates in fa­vor of $$H.$$ What hap­pens to your prob­a­bil­ities?

Your odds (for $$H$$) go from $$(1 : 1)$$ to $$(2 : 1)$$ to $$(4 : 1)$$ to $$(8 : 1)$$ to $$(16 : 1)$$ to $$(32 : 1).$$

Thus, your prob­a­bil­ities go from $$\frac{1}{2} = 50\%$$ to $$\frac{2}{3} \approx 67\%$$ to $$\frac{4}{5} = 80\%$$ to $$\frac{8}{9} \approx 89\%$$ to $$\frac{16}{17} \approx 94\%$$ to $$\frac{32}{33} \approx 97\%.$$

Graph­i­cally rep­re­sent­ing these chang­ing prob­a­bil­ities on a line that goes from 0 to 1:

We ob­serve that the prob­a­bil­ities ap­proach 1 but never get there — they just keep step­ping across a frac­tion of the re­main­ing dis­tance, even­tu­ally get­ting all scrunched up near the right end.

If we in­stead con­vert the prob­a­bil­ities into log-odds, the story is much nicer. 50% prob­a­bil­ity be­comes 0 bits of cre­dence, and ev­ery in­de­pen­dent $$(2 : 1)$$ ob­ser­va­tion in fa­vor of $$H$$ shifts be­lief by one unit along the line.

(As for what hap­pens when we ap­proach the end of the line, there isn’t one! 0% prob­a­bil­ity be­comes $$-\infty$$ bits of cre­dence and 100% prob­a­bil­ity be­comes $$+\infty$$ bits of cre­dence.

knows-req­ui­site(Math 2): %%note: This un-scrunch­ing of the in­ter­val $$(0,1)$$ into the en­tire real line is done by an ap­pli­ca­tion of the in­verse lo­gis­tic func­tion.%%
)

# In­tu­itions about the log-odds line

There are a num­ber of ways in which this in­finite log-odds line is a bet­ter place to an­chor your in­tu­itions about “be­lief” than the usual 1 prob­a­bil­ity in­ter­val. For ex­am­ple:

• Ev­i­dence you are twice as likely to see if the hy­poth­e­sis is true than if it is false is $${+1}$$ bits of ev­i­dence and a $${^+1}$$-bit up­date, re­gard­less of how con­fi­dent or un­con­fi­dent you were to start with—the strength of new ev­i­dence, and the dis­tance we up­date, shouldn’t de­pend on our prior be­lief.

• If your cre­dence in some­thing is 0 bits—nei­ther pos­i­tive or nega­tive be­lief—then you think the odds are 1:1.

• The dis­tance be­tween $$0.01$$ and $$0.000001$$ is much greater than the dis­tance be­tween $$0.11$$ and $$0.100001.$$

To ex­pand on the fi­nal point: on the 0-1 prob­a­bil­ity line, the differ­ence be­tween 0.01 (a 1% chance) and 0.000001 (a 1 in a mil­lion chance) is roughly the same as the dis­tance be­tween 11% and 10%. This doesn’t match our sense for the in­tu­itive strength of a claim: The differ­ence be­tween “1 in 100!” and “1 in a mil­lion!” feels like a far big­ger jump than the differ­ence be­tween “11% prob­a­bil­ity” and “a hair over 10% prob­a­biility.”

On the log-odds line, a 1 in 100 cred­i­bil­ity is $${^-2}$$ or­ders of mag­ni­tude, and a “1 in a mil­lion” cred­i­bil­ity is $${^-6}$$ or­ders of mag­ni­tude. The dis­tance be­tween them is minus 4 or­ders of mag­ni­tude, that is, $$\log_{10}(10^{-6}) - \log_{10}(10^{-2})$$ yields $${^-4}$$ mag­ni­tudes, or roughly $${^-13.3}$$ bits. On the other hand, 11% to 10% is $$\log_{10}(\frac{0.10}{0.90}) - \log_{10}(\frac{0.11}{0.89}) \approx {^-0.954}-{^-0.907} \approx {^-0.046}$$ mag­ni­tudes, or $${^-0.153}$$ bits.

The log-odds line doesn’t com­press the vast differ­ences available near the ends of the prob­a­bil­ity spec­trum. In­stead, it ex­hibits a “be­lief bar” car­ry­ing on in­definitely in both di­rec­tions—ev­ery time you see ev­i­dence with a like­li­hood ra­tio of $$2 : 1,$$ it adds one more bit of cred­i­bil­ity.

The We­ber-Fech­ner law says that most hu­man sen­sory per­cep­tions are log­a­r­ith­mic, in the sense that a fac­tor-of-2 in­ten­sity change feels like around the same amount of in­crease no mat­ter where you are on the scale. Dou­bling the phys­i­cal in­ten­sity of a sound feels to a hu­man like around the same amount of change in that sound whether the ini­tial sound was 40 deci­bels or 60 deci­bels. That’s why there’s an ex­po­nen­tial deci­bel scale of sound in­ten­si­ties in the first place!

Thus the log-odds form should be, in a cer­tain sense, the most in­tu­itive var­i­ant of Bayes’ rule to use: Just add the ev­i­dence-strength to the be­lief-strength! If you can make your feel­ings of ev­i­dence-strength and be­lief-strength be pro­por­tional to the log­a­r­ithms of ra­tios, that is.

Fi­nally, the log-odds rep­re­sen­ta­tion gives us an even eas­ier way to see how ex­traor­di­nary claims re­quire ex­traor­di­nary ev­i­dence: If your prior be­lief in $$H$$ is −30 bits, and you see ev­i­dence on the or­der of +5 bits for $$H$$, then you’re go­ing to wind up with −25 bits of be­lief in $$H$$, which means you still think it’s far less likely than the al­ter­na­tives.

# Ex­am­ple: Blue oysters

Con­sider the blue oys­ter ex­am­ple prob­lem:

You’re col­lect­ing ex­otic oys­ters in Nan­tucket, and there are two differ­ent bays from which you could har­vest oys­ters.

• In both bays, 11% of the oys­ters con­tain valuable pearls and 89% are empty.

• In the first bay, 4% of the pearl-con­tain­ing oys­ters are blue, and 8% of the non-pearl-con­tain­ing oys­ters are blue.

• In the sec­ond bay, 13% of the pearl-con­tain­ing oys­ters are blue, and 26% of the non-pearl-con­tain­ing oys­ters are blue.

Would you rather have a blue oys­ter from the first bay or the sec­ond bay? Well, we first note that the like­li­hood ra­tio from “blue oys­ter” to “full vs. empty” is $$1 : 2$$ in ei­ther case, so both kinds of blue oys­ter are equally valuable. (Take a mo­ment to re­flect on how ob­vi­ous this would not seem be­fore learn­ing about Bayes’ rule!)

But what’s the chance of (ei­ther kind of) a blue oys­ter con­tain­ing a pearl? Hint: this would be a good time to con­vert your cre­dences into bits (fac­tors of 2).

89% is around 8 times as much as 11%, so we start out with $${^-3}$$ bits of be­lief that a ran­dom oys­ter con­tains a pearl.

Full oys­ters are 12 as likely to be blue as empty oys­ters, so see­ing that an oys­ter is blue is $${^-1}$$ bits of ev­i­dence against it con­tain­ing a pearl.

Pos­te­rior be­lief should be around $${^-4}$$ bits or $$(1 : 16)$$ against, or a prob­a­bil­ity of 1/​17… so a bit more than 5% (1/​20) maybe? (Ac­tu­ally 5.88%.) <div><div>

# Real-life ex­am­ple: HIV test

Find & cite the refer­enced study A study of Chi­nese blood donors noteCi­ta­tion needed found that roughly 1 in 100,000 of them had HIV (as de­ter­mined by a very re­li­able gold-stan­dard test). The non-gold-stan­dard test used for ini­tial screen­ing had a sen­si­tivity of 99.7% and a speci­fic­ity of 99.8%, mean­ing re­spec­tively that $$\mathbb P({positive}\mid {HIV}) = .997$$ and $$\mathbb P({negative}\mid \neg {HIV}) = .998$$, i.e., $$\mathbb P({positive} \mid \neg {HIV}) = .002.$$

That is: the prior odds are $$1 : 100,000$$ against HIV, and a pos­i­tive re­sult in an ini­tial screen­ing fa­vors HIV with a like­li­hood ra­tio of $$500 : 1.$$

Us­ing log base 10 (be­cause those are eas­ier to do in your head):

• The prior be­lief in HIV was about −5 mag­ni­tudes.

• The ev­i­dence was a tad less than +3 mag­ni­tudes strong, since 500 is less than 1,000. ($\log_{10}(500) \ap­prox 2.7$).

So the pos­te­rior be­lief in HIV is a tad un­der­neath −2 mag­ni­tudes, i.e., less than a 1 in 100 chance of HIV.

Even though the screen­ing had a $$500 : 1$$ like­li­hood ra­tio in fa­vor of HIV, some­one with a pos­i­tive screen­ing re­sult re­ally should not panic!

Ad­mit­tedly, this setup had peo­ple be­ing screened ran­domly, in a rel­a­tively non-AIDS-stricken coun­try. You’d need sep­a­rate statis­tics for peo­ple who are get­ting tested for HIV be­cause of spe­cific wor­ries or con­cerns, or in coun­tries where HIV is highly preva­lent. Nonethe­less, the points that “only a tiny frac­tion of peo­ple have ill­ness X” and that “pre­limi­nary ob­ser­va­tions Y may not have cor­re­spond­ingly tiny false pos­i­tive rates” are worth re­mem­ber­ing for many ill­nesses X and ob­ser­va­tions Y.

# Ex­pos­ing in­finite credences

The log-odds rep­re­sen­ta­tion ex­poses the de­gree to which $$0$$ and $$1$$ are very un­usual among the clas­si­cal prob­a­bil­ities. For ex­am­ple, if you ever as­sign prob­a­bil­ity ab­solutely 0 or 1 to a hy­poth­e­sis, then no amount of ev­i­dence can change your mind about it, ever.

On the log-odds line, cre­dences range from $$-\infty$$ to $$+\infty,$$ with the in­finite ex­tremes cor­re­spond­ing to prob­a­bil­ity $$0$$ and $$1$$ which can thereby be seen as “in­finite cre­dences”. That’s not to say that $$0$$ and $$1$$ prob­a­bil­ities should never be used. For an ideal rea­soner, the prob­a­bil­ity $$\mathbb P(X) + \mathbb P(\lnot X)$$ should be 1 (where $$\lnot X$$ is the log­i­cal nega­tion of $$X$$).noteFor us mere mor­tals, con­sider avoid­ing ex­treme prob­a­bil­ities even then. Nev­er­the­less, these in­finite cre­dences of 0 and 1 be­have like ‘spe­cial ob­jects’ with a qual­i­ta­tively differ­ent be­hav­ior from the or­di­nary cre­dence spec­trum. State­ments like “After see­ing a piece of strong ev­i­dence, my be­lief should never be ex­actly what it was pre­vi­ously” are false for ex­treme cre­dences, just as state­ments like “sub­tract­ing 1 from a num­ber pro­duces a lower num­ber” are false if you in­sist on re­gard­ing

knows-req­ui­site(Math 2): $$\aleph_0$$
!knows-req­ui­site(Math 2): in­finity
as a num­ber.

# Ev­i­dence in decibels

E.T. Jaynes, in Prob­a­bil­ity The­ory: The Logic of Science (sec­tion 4.2), re­ports that us­ing deci­bels of ev­i­dence makes them eas­ier to grasp and use by hu­mans.

If an hy­poth­e­sis has a like­li­hood ra­tio of $$o$$, then its ev­i­dence in deci­bels is given by the for­mula $$e = 10\log_{10}(o)$$.

In this scheme, mul­ti­ply­ing the like­li­hood ra­tio by 2 means ap­prox­i­mately adding 3dB. Mul­ti­ply­ing by 10 means adding 10dB.

Jayne re­ports hav­ing used dec­i­mal log­a­r­ithm first, for their ease of calcu­la­tion and hav­ing tried to switch to nat­u­ral log­a­r­ithms with the ad­vent of pocket calcu­la­tors. But dec­i­mal log­a­r­ithms were found to be eas­ier to grasp.

Parents:

• Bayes' rule

Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.

• I recom­mend re­think­ing the mag­net metaphor, on the grounds that it is phys­i­cally wrong. If you have two mag­nets on ei­ther end of a ruler, and one is twice as strong as the other, then an iron ball at the cen­ter of the ruler is go­ing to roll all the way to the larger mag­net (ac­cel­er­at­ing as it goes, be­cause in­verse square law), un­less I’m miss­ing some­thing. Per­haps a bet­ter phys­i­cal metaphor would be some­thing like rub­ber bands, with each bit of ev­i­dence adding an­other rub­ber band from the be­lief level to pins at the ends of the ruler?

• I don’t un­der­stand this sen­tence.

• odds ra­tios?

The thing in­side the log(this part) is an odds ra­tio, right?

• I would ex­pect this sen­tence only af­ter an­other tel­ling me that the ob­ser­va­tions were red car, honk­ing, and punc­tu­al­ity. I think the next sen­tence should be bro­ken apart and this should be in­serted in­side.

• It would be nice to show how to go from 99.8% to the 500:1 ra­tio.

• I don’t think these terms have been defined yet. The differ­ence be­tween “strength of cre­dence” and “strength of ev­i­dence” isn’t ob­vi­ous to me, but it seems like it’s as­sumed through­out the rest of the ar­ti­cle that the reader knows what they mean.

• Is “-1 against” the same as “+1 for”?

Ex­press­ing the first prac­ti­cal ex­am­ple en­tirely in terms of nega­tive num­bers seems like a poor ped­a­gog­i­cal choice.

Phras­ing as “3 bits against” and then “a fur­ther 1 bit against” may help.

Ad­ding that the blue ones are not a great pick if you want pearls may help peo­ple un­der­stand the di­rec­tion of “against”.

• “Ex­treme cre­dences” here should likely be “in­finite cre­dences”.

Even so, pre­vi­ous page made the ex­act coun­ter­point:

While strong ev­i­dence may not change your view of things, things, ex­treme ev­i­dence ab­solutely should make you re­visit your es­ti­mate of even an in­finite cre­dence level.

• One of these does log( prob/​ 1 - prob) the other does log( prob) …

I get your point about or­ders of mag­ni­tude differ­ence, but for me this ends up more con­fus­ing then any­thing.

• Wrong, they are ex­actly the same dis­tances. I read the next para­graph so I get where you were go­ing with this, but I find it con­fus­ing to start off with a blatantly wrong claim, es­pe­cially when the next line com­pares 0.11 to 0.1 (11% to 10%) -- not to 0.100001 -- in or­der to de­scribe how the sig­nifi­cance of 0.00001 gets “lost in trans­la­tion” when speak­ing in prob­a­bil­ities and not in bits.

• But that re­ally gives a differ­ent mag­ni­tude to the ev­i­dence. Why not be con­sis­tent with the log base?

For ex­am­ple, if we were to use log base 2, the prior would be ~16.6 mag­ni­tudes strong and the ev­i­dence ~8. This means that the ev­i­dence would al­ter the prior by (slightly) less than half the or­der of mag­ni­tudes, where’s in the case of log base 10 the al­ter­a­tion is (slightly) more than half the or­der of mag­ni­tudes (5 vs 2.7).

Also, imag­ine the ab­surd choice of log base 100k. The prior would re­main prac­ti­cally in­tact in terms of this kind of or­der of mag­ni­tudes.

• It is re­ally con­fus­ing to ap­ply one of the ini­tial steps of a study as ev­i­dence to a prior which is the re­sult (last step) of the same study.

• Easier to grasp per­haps, but dan­ger­ously mis­lead­ing. In­creas­ing the like­li­hood of an event from 10^-100 to 10^-99 is very differ­ent and much less sig­nifi­cant than in­creas­ing it from 10^-2 (1%) to 10^-1 (10%). I hope this is cov­ered later in this guide.