Probability interpretations: Examples

Betting on one-time events

Consider evaluating, in June of 2016, the question: “What is the probability of Hillary Clinton winning the 2016 US presidential election?”

On the propensity view, Hillary has some fundamental chance of winning the election. To ask about the probability is to ask about this objective chance. If we see a prediction market in which prices move after each new poll — so that it says 60% one day, and 80% a week later — then clearly the prediction market isn’t giving us very strong information about this objective chance, since it doesn’t seem very likely that Clinton’s real chance of winning is swinging so rapidly.

On the frequentist view, we cannot formally or rigorously say anything about the 2016 presidential election, because it only happens once. We can’t observe a frequency with which Clinton wins presidential elections. A frequentist might concede that they would cheerfully buy for $1 a ticket that pays $20 if Clinton wins, considering this a favorable bet in an informal sense, while insisting that this sort of reasoning isn’t sufficiently rigorous, and therefore isn’t suitable for being included in science journals.

On the subjective view, saying that Hillary has an 80% chance of winning the election summarizes our knowledge about the election or our state of uncertainty given what we currently know. It makes sense for the prediction market prices to change in response to new polls, because our current state of knowledge is changing.

A coin with an unknown bias

Suppose we have a coin, weighted so that it lands heads somewhere between 0% and 100% of the time, but we don’t know the coin’s actual bias.

The coin is then flipped three times where we can see it. It comes up heads twice, and tails once: HHT.

The coin is then flipped again, where nobody can see it yet. An honest and trustworthy experimenter lets you spin a wheel-of-gambling-odds,noteThe reason for spinning the wheel-of-gambling-odds is to reduce the worry that the experimenter might know more about the coin than you, and be offering you a deliberately rigged bet. and the wheel lands on (2 : 1). The experimenter asks if you’d enter into a gamble where you win $2 if the unseen coin flip is tails, and pay $1 if the unseen coin flip is heads.

On a propensity view, the coin has some objective probability between 0 and 1 of being heads, but we just don’t know what this probability is. Seeing HHT tells us that the coin isn’t all-heads or all-tails, but we’re still just guessing — we don’t really know the answer, and can’t say whether the bet is a fair bet.

On a frequentist view, the coin would (if flipped repeatedly) produce some long-run frequency \(f\) of heads that is between 0 and 1. If we kept flipping the coin long enough, the actual proportion \(p\) of observed heads is guaranteed to approach \(f\) arbitrarily closely, eventually. We can’t say that the next coin flip is guaranteed to be H or T, but we can make an objectively true statement that \(p\) will approach \(f\) to within epsilon if we continue to flip the coin long enough.

To decide whether or not to take the bet, a frequentist might try to apply an unbiased estimator to the data we have so far. An “unbiased estimator” is a rule for taking an observation and producing an estimate \(e\) of \(f\), such that the expected value of \(e\) is \(f\). In other words, a frequentist wants a rule such that, if the hidden bias of the coin was in fact to yield 75% heads, and we repeat many times the operation of flipping the coin a few times and then asking a new frequentist to estimate the coin’s bias using this rule, the average value of the estimated bias will be 0.75. This is a property of the estimation rule which is objective. We can’t hope for a rule that will always, in any particular case, yield the true \(f\) from just a few coin flips; but we can have a rule which will provably have an average estimate of \(f\), if the experiment is repeated many times.

In this case, a simple unbiased estimator is to guess that the coin’s bias \(f\) is equal to the observed proportion of heads, or 23. In other words, if we repeat this experiment many many times, and whenever we see \(p\) heads in 3 tosses we guess that the coin’s bias is \(\frac{p}{3}\), then this rule definitely is an unbiased estimator. This estimator says that a bet of $2 vs. $\1 is fair, meaning that it doesn’t yield an expected profit, so we have no reason to take the bet.

On a subjectivist view, we start out personally unsure of where the bias \(f\) lies within the interval 1. Unless we have any knowledge or suspicion leading us to think otherwise, the coin is just as likely to have a bias between 33% and 34%, as to have a bias between 66% and 67%; there’s no reason to think it’s more likely to be in one range or the other.

Each coin flip we see is then evidence about the value of \(f,\) since a flip H happens with different probabilities depending on the different values of \(f,\) and we update our beliefs about \(f\) using Bayes’ rule. For example, H is twice as likely if \(f=\frac{2}{3}\) than if \(f=\frac{1}{3}\) so by Bayes’s Rule we should now think \(f\) is twice as likely to lie near \(\frac{2}{3}\) as it is to lie near \(\frac{1}{3}\).

When we start with a uniform prior, observe multiple flips of a coin with an unknown bias, see M heads and N tails, and then try to estimate the odds of the next flip coming up heads, the result is Laplace’s Rule of Succession which estimates (M + 1) : (N + 1) for a probability of \(\frac{M + 1}{M + N + 2}.\)

In this case, after observing HHT, we estimate odds of 2 : 3 for tails vs. heads on the next flip. This makes a gamble that wins $2 on tails and loses $1 on heads a profitable gamble in expectation, so we take the bet.

Our choice of a uniform prior over \(f\) was a little dubious — it’s the obvious way to express total ignorance about the bias of the coin, but obviousness isn’t everything. (For example, maybe we actually believe that a fair coin is more likely than a coin biased 50.0000023% towards heads.) However, all the reasoning after the choice of prior was rigorous according to the laws of probability theory, which is the only method of manipulating quantified uncertainty that obeys obvious-seeming rules about how subjective uncertainty should behave.

Probability that the 98,765th decimal digit of \(\pi\) is \(0\).

What is the probability that the 98,765th digit in the decimal expansion of \(\pi\) is \(0\)?

The propensity and frequentist views regard as nonsense the notion that we could talk about the probability of a mathematical fact. Either the 98,765th decimal digit of \(\pi\) is \(0\) or it’s not. If we’re running repeated experiments with a random number generator, and looking at different digits of \(\pi,\) then it might make sense to say that the random number generator has a 10% probability of picking numbers whose corresponding decimal digit of \(\pi\) is \(0\). But if we’re just picking a non-random number like 98,765, there’s no sense in which we could say that the 98,765th digit of \(\pi\) has a 10% propensity to be \(0\), or that this digit is \(0\) with 10% frequency in the long run.

The subjectivist considers probabilities to just refer to their own uncertainty. So if a subjectivist has picked the number 98,765 without yet knowing the corresponding digit of \(\pi,\) and hasn’t made any observation that is known to them to be entangled with the 98,765th digit of \(\pi,\) and they’re pretty sure their friend hasn’t yet looked up the 98,765th digit of \(\pi\) either, and their friend offers a whimsical gamble that costs $1 if the digit is non-zero and pays $20 if the digit is zero, the Bayesian takes the bet.

Note that this demonstrates a difference between the subjectivist interpretation of “probability” and Bayesian probability theory. A perfect Bayesian reasoner that knows the rules of logic and the definition of \(\pi\) must, by the axioms of probability theory, assign probability either 0 or 1 to the claim “the 98,765th digit of \(\pi\) is a \(0\)” (depending on whether or not it is). This is one of the reasons why perfect Bayesian reasoning is intractable. A subjectivist that is not a perfect Bayesian nevertheless claims that they are personally uncertain about the value of the 98,765th digit of \(\pi.\) Formalizing the rules of subjective probabilities about mathematical facts (in the way that probability theory formalized the rules for manipulating subjective probabilities about empirical facts, such as which way a coin came up) is an open problem; this in known as the problem of logical uncertainty.