Bayes' rule: Log-odds form

The odds form of Bayes’s Rule states that the prior odds times the like­li­hood ra­tio equals the pos­te­rior odds. We can take the log of both sides of this equa­tion, yield­ing an equiv­a­lent equa­tion which uses ad­di­tion in­stead of mul­ti­pli­ca­tion.

Let­ting \(H_i\) and \(H_j\) de­note hy­pothe­ses and \(e\) de­note ev­i­dence, the log-odds form of Bayes’ rule states:

$$ \log \left ( \dfrac {\mathbb P(H_i\mid e)} {\mathbb P(H_j\mid e)} \right ) = \log \left ( \dfrac {\mathbb P(H_i)} {\mathbb P(H_j)} \right ) + \log \left ( \dfrac {\mathbb P(e\mid H_i)} {\mathbb P(e\mid H_j)} \right ). $$

This can be nu­mer­i­cally effi­cient for when you’re car­ry­ing out lots of up­dates one af­ter an­other. But a more im­por­tant rea­son to think in log odds is to get a bet­ter grasp on the no­tion of ‘strength of ev­i­dence’.

Log­a­r­ithms of like­li­hood ratios

Sup­pose you’re vis­it­ing your friends An­drew and Betty, who are a cou­ple. They promised that one of them would pick you up from the air­port when you ar­rive. You’re not sure which one is in fact go­ing to pick you up (prior odds of 50:50), but you do know three things:

  1. They have both a blue car and a red car. An­drew prefers to drive the blue car, Betty prefers to drive the red car, but the cor­re­la­tion is rel­a­tively weak. (Some­times, which car they drive de­pends on which one their child is us­ing.) An­drew is 2x as likely to drive the blue car as Betty.

  2. Betty tends to honk the horn at you to get your at­ten­tion. An­drew does this too, but less of­ten. Betty is 4x as likely to honk as An­drew.

  3. An­drew tends to run a lit­tle late (more of­ten than Betty). Betty is 2x as likely to have the car already at the air­port when you ar­rive.

All three ob­ser­va­tions are in­de­pen­dent as far as you know (that is, you don’t think Betty’s any more or less likely to be late if she’s driv­ing the blue car, and so on).

Let’s say we see a blue car, already at the air­port, which honks.

The odds form of this calcu­la­tion would be a \((1 : 1)\) prior for Betty vs. An­drew, times like­li­hood ra­tios of \((1 : 2) \times (4 : 1) \times (2 : 1),\) yield­ing pos­te­rior odds of \((1 \times 4 \times 2 : 2 \times 1 \times 1) = (8 : 2) = (4 : 1)\), so it’s 45 = 80% likely to be Betty.

Here’s the log odds form of the same calcu­la­tion, us­ing 1 bit to de­note each fac­tor of \(2\) in be­lief or ev­i­dence:

  • Prior be­lief in Betty of \(\log_2 (\frac{1}{1}) = 0\) bits.

  • Ev­i­dence of \(\log_2 (\frac{1}{2}) = {-1}\) bits against Betty.

  • Ev­i­dence of \(\log_2 (\frac{4}{1}) = {+2}\) bits for Betty.

  • Ev­i­dence of \(\log_2 (\frac{2}{1}) = {+1}\) bit for Betty.

  • Pos­te­rior be­lief of \(0 + {^-1} + {^+2} + {^+1} = {^+2}\) bits that Betty is pick­ing us up.

If your pos­te­rior be­lief is +2 bits, then your pos­te­rior odds are \((2^{+2} : 1) = (4 : 1),\) yield­ing a pos­te­rior prob­a­bil­ity of 80% that Betty is pick­ing you up.

Ev­i­dence and be­lief rep­re­sented this way is ad­di­tive, which can make it an eas­ier fit for in­tu­itions about “strength of cre­dence” and “strength of ev­i­dence”; we’ll soon de­velop this point in fur­ther depth.

The log-odds line

Imag­ine you start out think­ing that the hy­poth­e­sis \(H\) is just as likely as \(\lnot H,\) its nega­tion. Then you get five sep­a­rate in­de­pen­dent \(2 : 1\) up­dates in fa­vor of \(H.\) What hap­pens to your prob­a­bil­ities?

Your odds (for \(H\)) go from \((1 : 1)\) to \((2 : 1)\) to \((4 : 1)\) to \((8 : 1)\) to \((16 : 1)\) to \((32 : 1).\)

Thus, your prob­a­bil­ities go from \(\frac{1}{2} = 50\%\) to \(\frac{2}{3} \approx 67\%\) to \(\frac{4}{5} = 80\%\) to \(\frac{8}{9} \approx 89\%\) to \(\frac{16}{17} \approx 94\%\) to \(\frac{32}{33} \approx 97\%.\)

Graph­i­cally rep­re­sent­ing these chang­ing prob­a­bil­ities on a line that goes from 0 to 1:

0 updates 3 updates 5 updates

We ob­serve that the prob­a­bil­ities ap­proach 1 but never get there — they just keep step­ping across a frac­tion of the re­main­ing dis­tance, even­tu­ally get­ting all scrunched up near the right end.

If we in­stead con­vert the prob­a­bil­ities into log-odds, the story is much nicer. 50% prob­a­bil­ity be­comes 0 bits of cre­dence, and ev­ery in­de­pen­dent \((2 : 1)\) ob­ser­va­tion in fa­vor of \(H\) shifts be­lief by one unit along the line.

the log-odds line

(As for what hap­pens when we ap­proach the end of the line, there isn’t one! 0% prob­a­bil­ity be­comes \(-\infty\) bits of cre­dence and 100% prob­a­bil­ity be­comes \(+\infty\) bits of cre­dence.

knows-req­ui­site(Math 2): %%note: This un-scrunch­ing of the in­ter­val \((0,1)\) into the en­tire real line is done by an ap­pli­ca­tion of the in­verse lo­gis­tic func­tion.%%
)

In­tu­itions about the log-odds line

There are a num­ber of ways in which this in­finite log-odds line is a bet­ter place to an­chor your in­tu­itions about “be­lief” than the usual 1 prob­a­bil­ity in­ter­val. For ex­am­ple:

  • Ev­i­dence you are twice as likely to see if the hy­poth­e­sis is true than if it is false is \({+1}\) bits of ev­i­dence and a \({^+1}\)-bit up­date, re­gard­less of how con­fi­dent or un­con­fi­dent you were to start with—the strength of new ev­i­dence, and the dis­tance we up­date, shouldn’t de­pend on our prior be­lief.

  • If your cre­dence in some­thing is 0 bits—nei­ther pos­i­tive or nega­tive be­lief—then you think the odds are 1:1.

  • The dis­tance be­tween \(0.01\) and \(0.000001\) is much greater than the dis­tance be­tween \(0.11\) and \(0.100001.\)

To ex­pand on the fi­nal point: on the 0-1 prob­a­bil­ity line, the differ­ence be­tween 0.01 (a 1% chance) and 0.000001 (a 1 in a mil­lion chance) is roughly the same as the dis­tance be­tween 11% and 10%. This doesn’t match our sense for the in­tu­itive strength of a claim: The differ­ence be­tween “1 in 100!” and “1 in a mil­lion!” feels like a far big­ger jump than the differ­ence be­tween “11% prob­a­bil­ity” and “a hair over 10% prob­a­biility.”

On the log-odds line, a 1 in 100 cred­i­bil­ity is \({^-2}\) or­ders of mag­ni­tude, and a “1 in a mil­lion” cred­i­bil­ity is \({^-6}\) or­ders of mag­ni­tude. The dis­tance be­tween them is minus 4 or­ders of mag­ni­tude, that is, \(\log_{10}(10^{-6}) - \log_{10}(10^{-2})\) yields \({^-4}\) mag­ni­tudes, or roughly \({^-13.3}\) bits. On the other hand, 11% to 10% is \(\log_{10}(\frac{0.10}{0.90}) - \log_{10}(\frac{0.11}{0.89}) \approx {^-0.954}-{^-0.907} \approx {^-0.046}\) mag­ni­tudes, or \({^-0.153}\) bits.

The log-odds line doesn’t com­press the vast differ­ences available near the ends of the prob­a­bil­ity spec­trum. In­stead, it ex­hibits a “be­lief bar” car­ry­ing on in­definitely in both di­rec­tions—ev­ery time you see ev­i­dence with a like­li­hood ra­tio of \(2 : 1,\) it adds one more bit of cred­i­bil­ity.

The We­ber-Fech­ner law says that most hu­man sen­sory per­cep­tions are log­a­r­ith­mic, in the sense that a fac­tor-of-2 in­ten­sity change feels like around the same amount of in­crease no mat­ter where you are on the scale. Dou­bling the phys­i­cal in­ten­sity of a sound feels to a hu­man like around the same amount of change in that sound whether the ini­tial sound was 40 deci­bels or 60 deci­bels. That’s why there’s an ex­po­nen­tial deci­bel scale of sound in­ten­si­ties in the first place!

Thus the log-odds form should be, in a cer­tain sense, the most in­tu­itive var­i­ant of Bayes’ rule to use: Just add the ev­i­dence-strength to the be­lief-strength! If you can make your feel­ings of ev­i­dence-strength and be­lief-strength be pro­por­tional to the log­a­r­ithms of ra­tios, that is.

Fi­nally, the log-odds rep­re­sen­ta­tion gives us an even eas­ier way to see how ex­traor­di­nary claims re­quire ex­traor­di­nary ev­i­dence: If your prior be­lief in \(H\) is −30 bits, and you see ev­i­dence on the or­der of +5 bits for \(H\), then you’re go­ing to wind up with −25 bits of be­lief in \(H\), which means you still think it’s far less likely than the al­ter­na­tives.

Ex­am­ple: Blue oysters

Con­sider the blue oys­ter ex­am­ple prob­lem:

You’re col­lect­ing ex­otic oys­ters in Nan­tucket, and there are two differ­ent bays from which you could har­vest oys­ters.

  • In both bays, 11% of the oys­ters con­tain valuable pearls and 89% are empty.

  • In the first bay, 4% of the pearl-con­tain­ing oys­ters are blue, and 8% of the non-pearl-con­tain­ing oys­ters are blue.

  • In the sec­ond bay, 13% of the pearl-con­tain­ing oys­ters are blue, and 26% of the non-pearl-con­tain­ing oys­ters are blue.

Would you rather have a blue oys­ter from the first bay or the sec­ond bay? Well, we first note that the like­li­hood ra­tio from “blue oys­ter” to “full vs. empty” is \(1 : 2\) in ei­ther case, so both kinds of blue oys­ter are equally valuable. (Take a mo­ment to re­flect on how ob­vi­ous this would not seem be­fore learn­ing about Bayes’ rule!)

But what’s the chance of (ei­ther kind of) a blue oys­ter con­tain­ing a pearl? Hint: this would be a good time to con­vert your cre­dences into bits (fac­tors of 2).

89% is around 8 times as much as 11%, so we start out with \({^-3}\) bits of be­lief that a ran­dom oys­ter con­tains a pearl.

Full oys­ters are 12 as likely to be blue as empty oys­ters, so see­ing that an oys­ter is blue is \({^-1}\) bits of ev­i­dence against it con­tain­ing a pearl.

Pos­te­rior be­lief should be around \({^-4}\) bits or \((1 : 16)\) against, or a prob­a­bil­ity of 1/​17… so a bit more than 5% (1/​20) maybe? (Ac­tu­ally 5.88%.) <div><div>

Real-life ex­am­ple: HIV test

Find & cite the refer­enced study A study of Chi­nese blood donors noteCi­ta­tion needed found that roughly 1 in 100,000 of them had HIV (as de­ter­mined by a very re­li­able gold-stan­dard test). The non-gold-stan­dard test used for ini­tial screen­ing had a sen­si­tivity of 99.7% and a speci­fic­ity of 99.8%, mean­ing re­spec­tively that \(\mathbb P({positive}\mid {HIV}) = .997\) and \(\mathbb P({negative}\mid \neg {HIV}) = .998\), i.e., \(\mathbb P({positive} \mid \neg {HIV}) = .002.\)

That is: the prior odds are \(1 : 100,000\) against HIV, and a pos­i­tive re­sult in an ini­tial screen­ing fa­vors HIV with a like­li­hood ra­tio of \(500 : 1.\)

Us­ing log base 10 (be­cause those are eas­ier to do in your head):

  • The prior be­lief in HIV was about −5 mag­ni­tudes.

  • The ev­i­dence was a tad less than +3 mag­ni­tudes strong, since 500 is less than 1,000. (\(\log_{10}(500) \approx 2.7\)).

So the pos­te­rior be­lief in HIV is a tad un­der­neath −2 mag­ni­tudes, i.e., less than a 1 in 100 chance of HIV.

Even though the screen­ing had a \(500 : 1\) like­li­hood ra­tio in fa­vor of HIV, some­one with a pos­i­tive screen­ing re­sult re­ally should not panic!

Ad­mit­tedly, this setup had peo­ple be­ing screened ran­domly, in a rel­a­tively non-AIDS-stricken coun­try. You’d need sep­a­rate statis­tics for peo­ple who are get­ting tested for HIV be­cause of spe­cific wor­ries or con­cerns, or in coun­tries where HIV is highly preva­lent. Nonethe­less, the points that “only a tiny frac­tion of peo­ple have ill­ness X” and that “pre­limi­nary ob­ser­va­tions Y may not have cor­re­spond­ingly tiny false pos­i­tive rates” are worth re­mem­ber­ing for many ill­nesses X and ob­ser­va­tions Y.

Ex­pos­ing in­finite credences

The log-odds rep­re­sen­ta­tion ex­poses the de­gree to which \(0\) and \(1\) are very un­usual among the clas­si­cal prob­a­bil­ities. For ex­am­ple, if you ever as­sign prob­a­bil­ity ab­solutely 0 or 1 to a hy­poth­e­sis, then no amount of ev­i­dence can change your mind about it, ever.

On the log-odds line, cre­dences range from \(-\infty\) to \(+\infty,\) with the in­finite ex­tremes cor­re­spond­ing to prob­a­bil­ity \(0\) and \(1\) which can thereby be seen as “in­finite cre­dences”. That’s not to say that \(0\) and \(1\) prob­a­bil­ities should never be used. For an ideal rea­soner, the prob­a­bil­ity \(\mathbb P(X) + \mathbb P(\lnot X)\) should be 1 (where \(\lnot X\) is the log­i­cal nega­tion of \(X\)).noteFor us mere mor­tals, con­sider avoid­ing ex­treme prob­a­bil­ities even then. Nev­er­the­less, these in­finite cre­dences of 0 and 1 be­have like ‘spe­cial ob­jects’ with a qual­i­ta­tively differ­ent be­hav­ior from the or­di­nary cre­dence spec­trum. State­ments like “After see­ing a piece of strong ev­i­dence, my be­lief should never be ex­actly what it was pre­vi­ously” are false for ex­treme cre­dences, just as state­ments like “sub­tract­ing 1 from a num­ber pro­duces a lower num­ber” are false if you in­sist on re­gard­ing

knows-req­ui­site(Math 2): \(\aleph_0\)
!knows-req­ui­site(Math 2): in­finity
as a num­ber.

Ev­i­dence in decibels

E.T. Jaynes, in Prob­a­bil­ity The­ory: The Logic of Science (sec­tion 4.2), re­ports that us­ing deci­bels of ev­i­dence makes them eas­ier to grasp and use by hu­mans.

If an hy­poth­e­sis has a like­li­hood ra­tio of \(o\), then its ev­i­dence in deci­bels is given by the for­mula \(e = 10\log_{10}(o)\).

In this scheme, mul­ti­ply­ing the like­li­hood ra­tio by 2 means ap­prox­i­mately adding 3dB. Mul­ti­ply­ing by 10 means adding 10dB.

Jayne re­ports hav­ing used dec­i­mal log­a­r­ithm first, for their ease of calcu­la­tion and hav­ing tried to switch to nat­u­ral log­a­r­ithms with the ad­vent of pocket calcu­la­tors. But dec­i­mal log­a­r­ithms were found to be eas­ier to grasp.

Parents:

  • Bayes' rule

    Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.