Bayes’ rule tells us how strong a piece of evidence has to be in order to support a given hypothesis. This lets us see whether a piece of evidence is sufficient, or insufficient, to drive the probability of a hypothesis to over 50%.

For example, consider the sparking widgets problem:

10% of widgets are bad and 90% are good. 4% of good widgets emit sparks, and 12% of bad widgets emit sparks. Can you calculate in your head what percentage of sparking widgets are bad?

The prior odds are 1 : 9 for bad widgets vs. good widgets.

12% of bad widgets and 4% of good widgets emit sparks, so that’s a likelihood ratio of 3 : 1 for sparking (bad widgets are three times as likely to emit sparks).

$(1 : 9 ) \times (3 : 1) \ = \ (3 : 9) \ \cong \ (1 : 3)$ posterior odds for bad vs. good sparking widgets. So ¹⁄₄ of sparking widgets are bad.

Bad widgets started out relatively rare: 1 in 10. We applied a test — looking for sparks — that was only 3 times more likely to identify bad widgets as opposed to good ones. The evidence was weaker than the prior improbability of the claim.

This doesn’t mean we toss out the evidence and ignore it. It does mean that, after updating on the observation of sparkiness, we only gave 25% posterior probability to the widget being bad — the probability didn’t go over 50%.

What would need to change to drive the probability of widget badness to over 50%? We would need evidence with a more extreme likelihood ratio, more extreme than the (9 : 1) prior odds. For example, if instead bad widgets were 50% likely to spark and good widgets were 5% likely to spark, the posterior odds would go to (10 : 9) or 53%.

In other words: For a previously implausible proposition $X$ to end up with a high posterior probability, the likelihood ratio for the new evidence favoring $X$ over its alternatives, needs to be more extreme than the prior odds against $X.$

This is the quantitative argument behind the qualitative statement that “extraordinary claims require extraordinary evidence” (a claim popularized by Carl Sagan, which dates back to at least Pierre-Simon Laplace).

That is: an “extraordinary claim” is one with a low prior probability in advance of considering the evidence, and “extraordinary evidence” is evidence with an extreme likelihood ratio favoring the claim over its alternatives.

What makes evidence extraordinary?

The likelihood ratio is defined as:

$$\text{Likelihood ratio} = \dfrac{\text{Probability of seeing the evidence, assuming the claim is true}}{\text{Probability of seeing the evidence, assuming the claim is false}}$$

To obtain an extreme likelihood ratio, the bottom of the fraction has to be very low. The top of the fraction being very high doesn’t help much. If the top of the fraction is 99% and the bottom is 70%, that’s still not a very extreme ratio, and it doesn’t help much if the top is 99.9999% instead.

So to get extremely strong evidence, we need to see an observation which is very improbable, given “business as usual,” but fairly likely according to the extraordinary claim. This observation would be deserving of the title, “extraordinary evidence”.

Example of an extraordinary claim and ordinary evidence: Bookcase aliens.

Consider the following hypothesis: What if there are Bookcase Aliens who teleport into our houses at night and drop off bookcases?

Bob offers the following evidence for this claim: “Last week, I visited my friend’s house, and there was a new bookcase there. If there were no bookcase aliens, I wouldn’t have expected that my friend would get a new bookcase. But if there are Bookcase Aliens, then the probability of my finding a new bookcase there was much higher. Therefore, my observation, ‘There is a new bookcase in my friend’s house,’ is strong evidence supporting the existence of Bookcase Aliens.”

In an intuitive sense, we have a notion that Bob’s evidence “There is a new bookcase in my friend’s house” is not as extraordinary as the claim “There are bookcase aliens”—that the evidence fails to lift the claim. Bayes’s Rule makes this statement precise.

Bob is, in fact, correct that his observation, “There’s a new bookcase in my friend’s house”, is indeed evidence favoring the Bookcase Aliens. Depending on how long it’s been since Bob last visited that house, there might ceteris paribus be, say, a 1% chance that there would be a new bookcase there. On the other hand, the Bookcase Aliens hypothesis might assign, say, 50% probability that the Bookcase Aliens would target this particular house among others. If so, that’s a likelihood ratio of 50:1 favoring the Bookcase Aliens hypothesis.

However, a reasonable prior on Bookcase Aliens would assign this a very low prior probability given our other, previous observations of the world. Let’s be conservative and assign odds of just 1 : 1,000,000,000 against Bookcase Aliens. Then to raise our posterior belief in Bookcase Aliens to somewhere in the “pragmatically noticeable” range of 1 : 100, we’d need to see evidence with a cumulative likelihood ratio of 10,000,000 : 1 favoring the Bookcase Aliens. 50 : 1 won’t cut it.

What would need to change for the observation “There’s a new bookcase in my friend’s house” to be convincing evidence of Bookcase Aliens, compared to the alternative hypothesis of “business as usual”?

As suggested by the Bayesian interpretation of strength of evidence, what we need to see is an observation which is nigh-impossible if there are not bookcase aliens. We would have to believe that, conditional on “business as usual” being true, the likelihood of seeing a bookcase was on the order of 0.00000001%. That would then take the likelihood ratio, aka strength of evidence, into the rough vicinity of a billion to one favoring Bookcase Aliens over “business as usual”.

We would still need to consider whether there might be other alternative hypotheses besides Bookcase Aliens and “business as usual”, such as a human-operated Bookcase Conspiracy. But at least we wouldn’t be dealing with an observation that was so unsurprising (conditional on business as usual) as to be unable to support any kind of extraordinary claim.

However, if instead we suppose that Bookcase Aliens are allegedly 99.999999% probable to add a bookcase to Bob’s friend’s house, very little changes—the likelihood ratio is 99.99999% : 1% or 100 : 1 instead of 50 : 1. To obtain an extreme likelihood ratio, we mainly need a tiny denominator rather than a big numerator. In other words, “extraordinary evidence”.

What makes claims extraordinary?

An obvious next question is what makes a claim ‘extraordinary’ or ‘ordinary’. This is a deep separate topic, but as an example, consider the claim that the Earth is becoming warmer due to carbon dioxide being added to its atmosphere.

To evaluate the ordinariness or extraordinariness of this claim:

We don’t ask whether the future consequences of this claim seem extreme or important.
We don’t ask whether the policies that would be required to address the claim are very costly.
We ask whether “carbon dioxide warms the atmosphere” or “carbon dioxide fails to warm the atmosphere” seems to conform better to the deep, causal generalizations we already have about carbon dioxide and heat.
If we’ve already considered the deep causal generalizations like those, we don’t ask about generalizations causally downstream of the deep causal ones we’ve already considered. (E.g., we don’t say, “But on every observed day for the last 200 years, the global temperature has stayed inside the following range; it would be ‘extraordinary’ to leave that range.”)

These tests suggest that “Large amounts of added carbon dioxide will incrementally warm Earth’s atmosphere” would have been an ‘ordinary’ claim in advance of trying to find any evidence for or against it—it’s just how you would expect a greenhouse gas to work, more or less. Thus, one is not entitled to demand a prediction made by this hypothesis that is wildly unlikely under any other hypothesis before believing it.

Incremental updating

A key feature of the Bookcase Aliens example is that the followers of Bayes’ rule acknowledges the observation of a new bookcase as being, locally, a single piece of evidence with a 50 : 1 likelihood ratio favoring Bookcase Aliens. The Bayesian doesn’t toss the observation out the window because it’s insufficient evidence; it just gets accumulated into the pool. If you visit house after house, and see new bookcase after new bookcase, the Bayesian slowly, incrementally, begins to wonder if something strange is going on, rather than dismissing each observation as ‘insufficient evidence’ and then forgetting it.

This stands in contrast to the instinctive way humans often behave, where, having concluded that they should not believe in Bookcase Aliens on the basis of the evidence in front of them, they discard that evidence entirely, denounce it, and say that it was never any evidence at all. (This is “treating arguments like soldiers” and acting like any evidence in favor of a proposition has to be “defeated.”)

The Bayesian just says “yes, that is evidence in favor of the claim, but it’s not quantitatively enough evidence.” This idiom also stands in contrast to the practice of treating any concession an opponent makes as a victory. If true claims are supposed to have all their arguments upheld and false claims are supposed to have all their enemy arguments defeated, then a single undefeated claim of support stands as a proof of victory, no matter how strong or weak the evidence that it provides. Not so with Bayesians — a Bayesian considers the bookcase observation to be locally a piece of evidence favoring Bookcase Aliens, just massively insufficient evidence.

Overriding evidence

If you think that a proposition has prior odds of 1 to a $10^{100}$, and then somebody presents evidence with a likelihood ratio of $10^{94}$ to one favoring the proposition, you shouldn’t say, “Oh, I guess the posterior odds are 1 to a million.” You should instead question whether either (a) you were wrong about the prior odds or (b) the evidence isn’t as strong as you assessed.

It’s not that hard to end up believing a hypothesis that had very low prior odds. For example, whenever you look at the exact pattern of 10 digits generated by a random number generator, you’re coming to believe a hypothesis that had prior odds on the order of ten billion to 1 against it.

But this should only happen with true hypotheses. It’s much rarer to find strong support for false hypotheses. Indeed, “strong evidence” is precisely “that sort of evidence we almost never see, when the proposition turns out to be false”.

Imagine tossing a fair coin at most 300 times, and asking how often the sequence of heads and tails that it generates along the way, ever supports the false hypothesis “this coin comes up heads 3/4ths of the time” strongly over the true hypothesis “this coin is fair”. As you can verify using this code, the sequence of coinflips will at some point support the false hypothesis at the 10 : 1 level on about 8% of runs; it will at some point support the false hypothesis at the 100 : 1 level on about 0.8% of runs, and it will at some point support the false hypothesis at the 1000 : 1 level on about 0.08% of runs. (Note that we are less and less likely to be more and more deceived.)

Seeing evidence with a strength of $(10^{94} : 1)$ / 94 orders of magnitude / 312 bits of evidence supporting a false hypothesis should only happen to you, on average, once every IT DIDN’T HAPPEN.

Witnessing an observation that truly has a $10^{-94}$ probability of occurring if the hypothesis is false, in a case where the hypothesis is in fact false, is something that will not happen to anyone even once over the expected lifetime of this universe.

So if you think that the prior odds for a coin being unfair are $(1 : 10^{100})$ against, and then you see the coin flipped 312 times and coming up heads each time… you do not say, “Well, my new posterior odds are $(1 : 10^6)$ against the coin being unfair.” You say, “I guess I was wrong about the prior odds being that low.”

Conor Duggan 28 Mar 2021 0:18 UTC
This seems less clear than saying “greater.” Am I missing something?
Patrick LaVictoire 3 Mar 2016 23:46 UTC
Possible inferential gap given just the pages I saw on my path to this one: the notion of “causally downstream” and the reason why “observed temperatures for the last 200 years” are causally downstream from “simple models of geophysics constructed to explain data about Earth and other planets”.
Hunter Meriwether 22 May 2016 0:05 UTC
Whoever wrote this knows what he is doing.
Hunter Meriwether 22 May 2016 0:14 UTC
Unlike the verbal incoherence of the previous commenter.
Adam Zerner 8 Dec 2016 3:54 UTC
I, in general, think things are clearer when real world examples like this are given in the beginning, rather than after the abstract explanation. I think most people find the same thing.
Alan De Smet 12 Dec 2016 22:42 UTC
“ceteris paribus” is an unusual Latin phrase in English. For clarity, a native English phrase may be better. Could go literal, changing “ceteris paribus be,” to ”, all other conditions remaining the same,” or a bit more idiomaticly ”, normally, be,”.
Dewi Morgan 25 Sep 2018 14:04 UTC
Be wary here.

We see on the next (log probability) that a plethora of small evidences sums to a very large number of bits.

In the bookcase aliens example, if you went to 312 houses and found that every one of them had a new bookcase, then by this approach, it’s time to reexamine the aliens hypothesis.

In practice, it’s just simply not. Aliens are still just as unlikely as they were previously. New bookcases are now more likely.

It’s time to reexamine your 50:1 in favor of aliens estimate for a new bookcase. It’s time to check whether there’s a really good door-to-door bookcase salesman offering ridiculous deals in the area. Or whether there are new tax incentives for people with more bookcases. Or a zillion other far more likely things than the false dichotomy of “either each person bought bookcases independently with odds of 50:1 against, or it’s bookcase aliens.”

The corollary of Doyle’s “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth” is “make damn sure to eliminate all the probable stuff, before gallivanting into the weeds of the infeasible”.
Dewi Morgan 25 Sep 2018 19:00 UTC
This (the ignoring of cost) seems like a flaw to Bayesian analysis, and makes me think there’s probably some extension to it, which is being omitted here for simplicity, but which takes into account something like cost, value, or utility.

For example, the “cost” of a bayesian filter deciding to show a salesman a spam email is far lower than the “cost” of the same filter deciding to prevent them from seeing an email from a million-dollar sales lead.

So, while the calculation of probabilities should not take into account cost, it feels like the making decisions of based on those probabilities should take cost into account.

For example: the chances of our getting wiped out in the near future by a natural disaster. Yet, the potential consequences are dire, and the net costs per person of detection are low, or even negative. Therefore, we have a global near-earth-object detection network, a tsunami and quake detection network, fire watch towers, weather and climate monitors, disease tracking centers, and so on.

If this extension to Bayesian analysis exists, this seem a sensible place to link to it.
Eyal Roth 18 Mar 2019 13:06 UTC
I really have a hard time understanding the point of this section.

What difference is there between calculating the posterior given an evidence—thus updating the future prior—and questioning the prior “in the first place”? Isn’t this the whole point of the process? to examine the prior and question it in case of an extraordinary evidence?

Extraordinary claims require extraordinary evidence

What makes evidence extraordinary?

Example of an extraordinary claim and ordinary evidence: Bookcase aliens.

What makes claims extraordinary?

Incremental updating

Overriding evidence