Extraordinary claims require extraordinary evidence

Bayes’ rule tells us how strong a piece of ev­i­dence has to be in or­der to sup­port a given hy­poth­e­sis. This lets us see whether a piece of ev­i­dence is suffi­cient, or in­suffi­cient, to drive the prob­a­bil­ity of a hy­poth­e­sis to over 50%.

For ex­am­ple, con­sider the spark­ing wid­gets prob­lem:

10% of wid­gets are bad and 90% are good. 4% of good wid­gets emit sparks, and 12% of bad wid­gets emit sparks. Can you calcu­late in your head what per­centage of spark­ing wid­gets are bad?

The prior odds are 1 : 9 for bad wid­gets vs. good wid­gets.

12% of bad wid­gets and 4% of good wid­gets emit sparks, so that’s a like­li­hood ra­tio of 3 : 1 for spark­ing (bad wid­gets are three times as likely to emit sparks).

$$(1 : 9 ) \times (3 : 1) \ = \ (3 : 9) \ \cong \ (1 : 3)$$ pos­te­rior odds for bad vs. good spark­ing wid­gets. So 14 of spark­ing wid­gets are bad.

Bad wid­gets started out rel­a­tively rare: 1 in 10. We ap­plied a test — look­ing for sparks — that was only 3 times more likely to iden­tify bad wid­gets as op­posed to good ones. The ev­i­dence was weaker than the prior im­prob­a­bil­ity of the claim.

This doesn’t mean we toss out the ev­i­dence and ig­nore it. It does mean that, af­ter up­dat­ing on the ob­ser­va­tion of spark­i­ness, we only gave 25% pos­te­rior prob­a­bil­ity to the wid­get be­ing bad — the prob­a­bil­ity didn’t go over 50%.

What would need to change to drive the prob­a­bil­ity of wid­get bad­ness to over 50%? We would need ev­i­dence with a more ex­treme like­li­hood ra­tio, more ex­treme than the (9 : 1) prior odds. For ex­am­ple, if in­stead bad wid­gets were 50% likely to spark and good wid­gets were 5% likely to spark, the pos­te­rior odds would go to (10 : 9) or 53%.

In other words: For a pre­vi­ously im­plau­si­ble propo­si­tion $$X$$ to end up with a high pos­te­rior prob­a­bil­ity, the like­li­hood ra­tio for the new ev­i­dence fa­vor­ing $$X$$ over its al­ter­na­tives, needs to be more ex­treme than the prior odds against $$X.$$

This is the quan­ti­ta­tive ar­gu­ment be­hind the qual­i­ta­tive state­ment that “ex­traor­di­nary claims re­quire ex­traor­di­nary ev­i­dence” (a claim pop­u­larized by Carl Sa­gan, which dates back to at least Pierre-Si­mon Laplace).

That is: an “ex­traor­di­nary claim” is one with a low prior prob­a­bil­ity in ad­vance of con­sid­er­ing the ev­i­dence, and “ex­traor­di­nary ev­i­dence” is ev­i­dence with an ex­treme like­li­hood ra­tio fa­vor­ing the claim over its al­ter­na­tives.

What makes ev­i­dence ex­traor­di­nary?

The like­li­hood ra­tio is defined as:

$$\text{Likelihood ratio} = \dfrac{\text{Probability of seeing the evidence, assuming the claim is true}}{\text{Probability of seeing the evidence, assuming the claim is false}}$$

To ob­tain an ex­treme like­li­hood ra­tio, the bot­tom of the frac­tion has to be very low. The top of the frac­tion be­ing very high doesn’t help much. If the top of the frac­tion is 99% and the bot­tom is 70%, that’s still not a very ex­treme ra­tio, and it doesn’t help much if the top is 99.9999% in­stead.

So to get ex­tremely strong ev­i­dence, we need to see an ob­ser­va­tion which is very im­prob­a­ble, given “busi­ness as usual,” but fairly likely ac­cord­ing to the ex­traor­di­nary claim. This ob­ser­va­tion would be de­serv­ing of the ti­tle, “ex­traor­di­nary ev­i­dence”.

Ex­am­ple of an ex­traor­di­nary claim and or­di­nary ev­i­dence: Book­case aliens.

Con­sider the fol­low­ing hy­poth­e­sis: What if there are Book­case Aliens who tele­port into our houses at night and drop off book­cases?

Bob offers the fol­low­ing ev­i­dence for this claim: “Last week, I vis­ited my friend’s house, and there was a new book­case there. If there were no book­case aliens, I wouldn’t have ex­pected that my friend would get a new book­case. But if there are Book­case Aliens, then the prob­a­bil­ity of my find­ing a new book­case there was much higher. There­fore, my ob­ser­va­tion, ‘There is a new book­case in my friend’s house,’ is strong ev­i­dence sup­port­ing the ex­is­tence of Book­case Aliens.”

In an in­tu­itive sense, we have a no­tion that Bob’s ev­i­dence “There is a new book­case in my friend’s house” is not as ex­traor­di­nary as the claim “There are book­case aliens”—that the ev­i­dence fails to lift the claim. Bayes’s Rule makes this state­ment pre­cise.

Bob is, in fact, cor­rect that his ob­ser­va­tion, “There’s a new book­case in my friend’s house”, is in­deed ev­i­dence fa­vor­ing the Book­case Aliens. Depend­ing on how long it’s been since Bob last vis­ited that house, there might ce­teris paribus be, say, a 1% chance that there would be a new book­case there. On the other hand, the Book­case Aliens hy­poth­e­sis might as­sign, say, 50% prob­a­bil­ity that the Book­case Aliens would tar­get this par­tic­u­lar house among oth­ers. If so, that’s a like­li­hood ra­tio of 50:1 fa­vor­ing the Book­case Aliens hy­poth­e­sis.

How­ever, a rea­son­able prior on Book­case Aliens would as­sign this a very low prior prob­a­bil­ity given our other, pre­vi­ous ob­ser­va­tions of the world. Let’s be con­ser­va­tive and as­sign odds of just 1 : 1,000,000,000 against Book­case Aliens. Then to raise our pos­te­rior be­lief in Book­case Aliens to some­where in the “prag­mat­i­cally no­tice­able” range of 1 : 100, we’d need to see ev­i­dence with a cu­mu­la­tive like­li­hood ra­tio of 10,000,000 : 1 fa­vor­ing the Book­case Aliens. 50 : 1 won’t cut it.

What would need to change for the ob­ser­va­tion “There’s a new book­case in my friend’s house” to be con­vinc­ing ev­i­dence of Book­case Aliens, com­pared to the al­ter­na­tive hy­poth­e­sis of “busi­ness as usual”?

As sug­gested by the Bayesian in­ter­pre­ta­tion of strength of ev­i­dence, what we need to see is an ob­ser­va­tion which is nigh-im­pos­si­ble if there are not book­case aliens. We would have to be­lieve that, con­di­tional on “busi­ness as usual” be­ing true, the like­li­hood of see­ing a book­case was on the or­der of 0.00000001%. That would then take the like­li­hood ra­tio, aka strength of ev­i­dence, into the rough vicinity of a billion to one fa­vor­ing Book­case Aliens over “busi­ness as usual”.

We would still need to con­sider whether there might be other al­ter­na­tive hy­pothe­ses be­sides Book­case Aliens and “busi­ness as usual”, such as a hu­man-op­er­ated Book­case Con­spir­acy. But at least we wouldn’t be deal­ing with an ob­ser­va­tion that was so un­sur­pris­ing (con­di­tional on busi­ness as usual) as to be un­able to sup­port any kind of ex­traor­di­nary claim.

How­ever, if in­stead we sup­pose that Book­case Aliens are allegedly 99.999999% prob­a­ble to add a book­case to Bob’s friend’s house, very lit­tle changes—the like­li­hood ra­tio is 99.99999% : 1% or 100 : 1 in­stead of 50 : 1. To ob­tain an ex­treme like­li­hood ra­tio, we mainly need a tiny de­nom­i­na­tor rather than a big nu­mer­a­tor. In other words, “ex­traor­di­nary ev­i­dence”.

What makes claims ex­traor­di­nary?

An ob­vi­ous next ques­tion is what makes a claim ‘ex­traor­di­nary’ or ‘or­di­nary’. This is a deep sep­a­rate topic, but as an ex­am­ple, con­sider the claim that the Earth is be­com­ing warmer due to car­bon diox­ide be­ing added to its at­mo­sphere.

To eval­u­ate the or­di­nar­i­ness or ex­traor­di­nar­i­ness of this claim:

• We don’t ask whether the fu­ture con­se­quences of this claim seem ex­treme or im­por­tant.

• We don’t ask whether the poli­cies that would be re­quired to ad­dress the claim are very costly.

• We ask whether “car­bon diox­ide warms the at­mo­sphere” or “car­bon diox­ide fails to warm the at­mo­sphere” seems to con­form bet­ter to the deep, causal gen­er­al­iza­tions we already have about car­bon diox­ide and heat.

• If we’ve already con­sid­ered the deep causal gen­er­al­iza­tions like those, we don’t ask about gen­er­al­iza­tions causally down­stream of the deep causal ones we’ve already con­sid­ered. (E.g., we don’t say, “But on ev­ery ob­served day for the last 200 years, the global tem­per­a­ture has stayed in­side the fol­low­ing range; it would be ‘ex­traor­di­nary’ to leave that range.”)

Th­ese tests sug­gest that “Large amounts of added car­bon diox­ide will in­cre­men­tally warm Earth’s at­mo­sphere” would have been an ‘or­di­nary’ claim in ad­vance of try­ing to find any ev­i­dence for or against it—it’s just how you would ex­pect a green­house gas to work, more or less. Thus, one is not en­ti­tled to de­mand a pre­dic­tion made by this hy­poth­e­sis that is wildly un­likely un­der any other hy­poth­e­sis be­fore be­liev­ing it.

In­cre­men­tal updating

A key fea­ture of the Book­case Aliens ex­am­ple is that the fol­low­ers of Bayes’ rule ac­knowl­edges the ob­ser­va­tion of a new book­case as be­ing, lo­cally, a sin­gle piece of ev­i­dence with a 50 : 1 like­li­hood ra­tio fa­vor­ing Book­case Aliens. The Bayesian doesn’t toss the ob­ser­va­tion out the win­dow be­cause it’s in­suffi­cient ev­i­dence; it just gets ac­cu­mu­lated into the pool. If you visit house af­ter house, and see new book­case af­ter new book­case, the Bayesian slowly, in­cre­men­tally, be­gins to won­der if some­thing strange is go­ing on, rather than dis­miss­ing each ob­ser­va­tion as ‘in­suffi­cient ev­i­dence’ and then for­get­ting it.

This stands in con­trast to the in­stinc­tive way hu­mans of­ten be­have, where, hav­ing con­cluded that they should not be­lieve in Book­case Aliens on the ba­sis of the ev­i­dence in front of them, they dis­card that ev­i­dence en­tirely, de­nounce it, and say that it was never any ev­i­dence at all. (This is “treat­ing ar­gu­ments like sol­diers” and act­ing like any ev­i­dence in fa­vor of a propo­si­tion has to be “defeated.”)

The Bayesian just says “yes, that is ev­i­dence in fa­vor of the claim, but it’s not quan­ti­ta­tively enough ev­i­dence.” This idiom also stands in con­trast to the prac­tice of treat­ing any con­ces­sion an op­po­nent makes as a vic­tory. If true claims are sup­posed to have all their ar­gu­ments up­held and false claims are sup­posed to have all their en­emy ar­gu­ments defeated, then a sin­gle un­defeated claim of sup­port stands as a proof of vic­tory, no mat­ter how strong or weak the ev­i­dence that it pro­vides. Not so with Bayesi­ans — a Bayesian con­sid­ers the book­case ob­ser­va­tion to be lo­cally a piece of ev­i­dence fa­vor­ing Book­case Aliens, just mas­sively in­suffi­cient ev­i­dence.

Over­rid­ing evidence

If you think that a propo­si­tion has prior odds of 1 to a $$10^{100}$$, and then some­body pre­sents ev­i­dence with a like­li­hood ra­tio of $$10^{94}$$ to one fa­vor­ing the propo­si­tion, you shouldn’t say, “Oh, I guess the pos­te­rior odds are 1 to a mil­lion.” You should in­stead ques­tion whether ei­ther (a) you were wrong about the prior odds or (b) the ev­i­dence isn’t as strong as you as­sessed.

It’s not that hard to end up be­liev­ing a hy­poth­e­sis that had very low prior odds. For ex­am­ple, when­ever you look at the ex­act pat­tern of 10 digits gen­er­ated by a ran­dom num­ber gen­er­a­tor, you’re com­ing to be­lieve a hy­poth­e­sis that had prior odds on the or­der of ten billion to 1 against it.

But this should only hap­pen with true hy­pothe­ses. It’s much rarer to find strong sup­port for false hy­pothe­ses. In­deed, “strong ev­i­dence” is pre­cisely “that sort of ev­i­dence we al­most never see, when the propo­si­tion turns out to be false”.

Imag­ine toss­ing a fair coin at most 300 times, and ask­ing how of­ten the se­quence of heads and tails that it gen­er­ates along the way, ever sup­ports the false hy­poth­e­sis “this coin comes up heads 3/​4ths of the time” strongly over the true hy­poth­e­sis “this coin is fair”. As you can ver­ify us­ing this code, the se­quence of coin­flips will at some point sup­port the false hy­poth­e­sis at the 10 : 1 level on about 8% of runs; it will at some point sup­port the false hy­poth­e­sis at the 100 : 1 level on about 0.8% of runs, and it will at some point sup­port the false hy­poth­e­sis at the 1000 : 1 level on about 0.08% of runs. (Note that we are less and less likely to be more and more de­ceived.)

See­ing ev­i­dence with a strength of $$(10^{94} : 1)$$ /​ 94 or­ders of mag­ni­tude /​ 312 bits of ev­i­dence sup­port­ing a false hy­poth­e­sis should only hap­pen to you, on av­er­age, once ev­ery IT DIDN’T HAPPEN.

Wit­ness­ing an ob­ser­va­tion that truly has a $$10^{-94}$$ prob­a­bil­ity of oc­cur­ring if the hy­poth­e­sis is false, in a case where the hy­poth­e­sis is in fact false, is some­thing that will not hap­pen to any­one even once over the ex­pected life­time of this uni­verse.

So if you think that the prior odds for a coin be­ing un­fair are $$(1 : 10^{100})$$ against, and then you see the coin flipped 312 times and com­ing up heads each time… you do not say, “Well, my new pos­te­rior odds are $$(1 : 10^6)$$ against the coin be­ing un­fair.” You say, “I guess I was wrong about the prior odds be­ing that low.”

Children:

• Extraordinary claims

What makes some­thing an ‘ex­traor­di­nary claim’ that re­quires ex­traor­di­nary ev­i­dence?

Parents:

• Bayesian update

Bayesian up­dat­ing: the ideal way to change prob­a­bil­is­tic be­liefs based on ev­i­dence.

• Pos­si­ble in­fer­en­tial gap given just the pages I saw on my path to this one: the no­tion of “causally down­stream” and the rea­son why “ob­served tem­per­a­tures for the last 200 years” are causally down­stream from “sim­ple mod­els of geo­physics con­structed to ex­plain data about Earth and other planets”.

• Who­ever wrote this knows what he is do­ing.

• Un­like the ver­bal in­co­her­ence of the pre­vi­ous com­menter.

• I, in gen­eral, think things are clearer when real world ex­am­ples like this are given in the be­gin­ning, rather than af­ter the ab­stract ex­pla­na­tion. I think most peo­ple find the same thing.

• “ce­teris paribus” is an un­usual Latin phrase in English. For clar­ity, a na­tive English phrase may be bet­ter. Could go literal, chang­ing “ce­teris paribus be,” to ”, all other con­di­tions re­main­ing the same,” or a bit more idio­mat­i­cly ”, nor­mally, be,”.

• Be wary here.

We see on the next (log prob­a­bil­ity) that a plethora of small ev­i­dences sums to a very large num­ber of bits.

In the book­case aliens ex­am­ple, if you went to 312 houses and found that ev­ery one of them had a new book­case, then by this ap­proach, it’s time to re­ex­am­ine the aliens hy­poth­e­sis.

In prac­tice, it’s just sim­ply not. Aliens are still just as un­likely as they were pre­vi­ously. New book­cases are now more likely.

It’s time to re­ex­am­ine your 50:1 in fa­vor of aliens es­ti­mate for a new book­case. It’s time to check whether there’s a re­ally good door-to-door book­case sales­man offer­ing ridicu­lous deals in the area. Or whether there are new tax in­cen­tives for peo­ple with more book­cases. Or a zillion other far more likely things than the false di­chotomy of “ei­ther each per­son bought book­cases in­de­pen­dently with odds of 50:1 against, or it’s book­case aliens.”

The corol­lary of Doyle’s “Once you elimi­nate the im­pos­si­ble, what­ever re­mains, no mat­ter how im­prob­a­ble, must be the truth” is “make damn sure to elimi­nate all the prob­a­ble stuff, be­fore gal­li­vant­ing into the weeds of the in­fea­si­ble”.

• This (the ig­nor­ing of cost) seems like a flaw to Bayesian anal­y­sis, and makes me think there’s prob­a­bly some ex­ten­sion to it, which is be­ing omit­ted here for sim­plic­ity, but which takes into ac­count some­thing like cost, value, or util­ity.

For ex­am­ple, the “cost” of a bayesian filter de­cid­ing to show a sales­man a spam email is far lower than the “cost” of the same filter de­cid­ing to pre­vent them from see­ing an email from a mil­lion-dol­lar sales lead.

So, while the calcu­la­tion of prob­a­bil­ities should not take into ac­count cost, it feels like the mak­ing de­ci­sions of based on those prob­a­bil­ities should take cost into ac­count.

For ex­am­ple: the chances of our get­ting wiped out in the near fu­ture by a nat­u­ral dis­aster. Yet, the po­ten­tial con­se­quences are dire, and the net costs per per­son of de­tec­tion are low, or even nega­tive. There­fore, we have a global near-earth-ob­ject de­tec­tion net­work, a tsunami and quake de­tec­tion net­work, fire watch tow­ers, weather and cli­mate mon­i­tors, dis­ease track­ing cen­ters, and so on.

If this ex­ten­sion to Bayesian anal­y­sis ex­ists, this seem a sen­si­ble place to link to it.

• I re­ally have a hard time un­der­stand­ing the point of this sec­tion.

What differ­ence is there be­tween calcu­lat­ing the pos­te­rior given an ev­i­dence—thus up­dat­ing the fu­ture prior—and ques­tion­ing the prior “in the first place”? Isn’t this the whole point of the pro­cess? to ex­am­ine the prior and ques­tion it in case of an ex­traor­di­nary ev­i­dence?