Probability interpretations: Examples

Bet­ting on one-time events

Con­sider eval­u­at­ing, in June of 2016, the ques­tion: “What is the prob­a­bil­ity of Hillary Clin­ton win­ning the 2016 US pres­i­den­tial elec­tion?”

On the propen­sity view, Hillary has some fun­da­men­tal chance of win­ning the elec­tion. To ask about the prob­a­bil­ity is to ask about this ob­jec­tive chance. If we see a pre­dic­tion mar­ket in which prices move af­ter each new poll — so that it says 60% one day, and 80% a week later — then clearly the pre­dic­tion mar­ket isn’t giv­ing us very strong in­for­ma­tion about this ob­jec­tive chance, since it doesn’t seem very likely that Clin­ton’s real chance of win­ning is swing­ing so rapidly.

On the fre­quen­tist view, we can­not for­mally or rigor­ously say any­thing about the 2016 pres­i­den­tial elec­tion, be­cause it only hap­pens once. We can’t ob­serve a fre­quency with which Clin­ton wins pres­i­den­tial elec­tions. A fre­quen­tist might con­cede that they would cheer­fully buy for $1 a ticket that pays $20 if Clin­ton wins, con­sid­er­ing this a fa­vor­able bet in an in­for­mal sense, while in­sist­ing that this sort of rea­son­ing isn’t suffi­ciently rigor­ous, and there­fore isn’t suit­able for be­ing in­cluded in sci­ence jour­nals.

On the sub­jec­tive view, say­ing that Hillary has an 80% chance of win­ning the elec­tion sum­ma­rizes our knowl­edge about the elec­tion or our state of un­cer­tainty given what we cur­rently know. It makes sense for the pre­dic­tion mar­ket prices to change in re­sponse to new polls, be­cause our cur­rent state of knowl­edge is chang­ing.

A coin with an un­known bias

Sup­pose we have a coin, weighted so that it lands heads some­where be­tween 0% and 100% of the time, but we don’t know the coin’s ac­tual bias.

The coin is then flipped three times where we can see it. It comes up heads twice, and tails once: HHT.

The coin is then flipped again, where no­body can see it yet. An hon­est and trust­wor­thy ex­per­i­menter lets you spin a wheel-of-gam­bling-odds,noteThe rea­son for spin­ning the wheel-of-gam­bling-odds is to re­duce the worry that the ex­per­i­menter might know more about the coin than you, and be offer­ing you a de­liber­ately rigged bet. and the wheel lands on (2 : 1). The ex­per­i­menter asks if you’d en­ter into a gam­ble where you win $2 if the un­seen coin flip is tails, and pay $1 if the un­seen coin flip is heads.

On a propen­sity view, the coin has some ob­jec­tive prob­a­bil­ity be­tween 0 and 1 of be­ing heads, but we just don’t know what this prob­a­bil­ity is. See­ing HHT tells us that the coin isn’t all-heads or all-tails, but we’re still just guess­ing — we don’t re­ally know the an­swer, and can’t say whether the bet is a fair bet.

On a fre­quen­tist view, the coin would (if flipped re­peat­edly) pro­duce some long-run fre­quency \(f\) of heads that is be­tween 0 and 1. If we kept flip­ping the coin long enough, the ac­tual pro­por­tion \(p\) of ob­served heads is guaran­teed to ap­proach \(f\) ar­bi­trar­ily closely, even­tu­ally. We can’t say that the next coin flip is guaran­teed to be H or T, but we can make an ob­jec­tively true state­ment that \(p\) will ap­proach \(f\) to within ep­silon if we con­tinue to flip the coin long enough.

To de­cide whether or not to take the bet, a fre­quen­tist might try to ap­ply an un­bi­ased es­ti­ma­tor to the data we have so far. An “un­bi­ased es­ti­ma­tor” is a rule for tak­ing an ob­ser­va­tion and pro­duc­ing an es­ti­mate \(e\) of \(f\), such that the ex­pected value of \(e\) is \(f\). In other words, a fre­quen­tist wants a rule such that, if the hid­den bias of the coin was in fact to yield 75% heads, and we re­peat many times the op­er­a­tion of flip­ping the coin a few times and then ask­ing a new fre­quen­tist to es­ti­mate the coin’s bias us­ing this rule, the av­er­age value of the es­ti­mated bias will be 0.75. This is a prop­erty of the es­ti­ma­tion rule which is ob­jec­tive. We can’t hope for a rule that will always, in any par­tic­u­lar case, yield the true \(f\) from just a few coin flips; but we can have a rule which will prov­ably have an av­er­age es­ti­mate of \(f\), if the ex­per­i­ment is re­peated many times.

In this case, a sim­ple un­bi­ased es­ti­ma­tor is to guess that the coin’s bias \(f\) is equal to the ob­served pro­por­tion of heads, or 23. In other words, if we re­peat this ex­per­i­ment many many times, and when­ever we see \(p\) heads in 3 tosses we guess that the coin’s bias is \(\frac{p}{3}\), then this rule definitely is an un­bi­ased es­ti­ma­tor. This es­ti­ma­tor says that a bet of $2 vs. \(\1 is fair, meaning that it doesn't yield an expected profit, so we have no reason to take the bet. On a **subjectivist** view, we start out personally unsure of where the bias \)f\( lies within the interval [0, 1]. Unless we have any knowledge or suspicion leading us to think otherwise, the coin is just as likely to have a bias between 33% and 34%, as to have a bias between 66% and 67%; there's no reason to think it's more likely to be in one range or the other. Each coin flip we see is then [22x evidence] about the value of \)f,\( since a flip H happens with different probabilities depending on the different values of \)f,\( and we update our beliefs about \)f\( using [1zj Bayes' rule]. For example, H is twice as likely if \)f=\frac{2}{3}\( than if \)f=\frac{1}{3}\( so by [1zm Bayes's Rule] we should now think \)f\( is twice as likely to lie near \)\frac{2}{3}\( as it is to lie near \)\frac{1}{3}\(. When we start with a uniform [219 prior], observe multiple flips of a coin with an unknown bias, see M heads and N tails, and then try to estimate the odds of the next flip coming up heads, the result is [21c Laplace's Rule of Succession] which estimates (M + 1) : (N + 1) for a probability of \)\frac{M + 1}{M + N + 2}.\( In this case, after observing HHT, we estimate odds of 2 : 3 for tails vs. heads on the next flip. This makes a gamble that wins \$2 on tails and loses \$1 on heads a profitable gamble in expectation, so we take the bet. Our choice of a [219 uniform prior] over \)f\( was a little dubious — it's the obvious way to express total ignorance about the bias of the coin, but obviousness isn't everything. (For example, maybe we actually believe that a fair coin is more likely than a coin biased 50.0000023% towards heads.) However, all the reasoning after the choice of prior was rigorous according to the laws of [1bv probability theory], which is the [probability_coherence_theorems only method of manipulating quantified uncertainty] that obeys obvious-seeming rules about how subjective uncertainty should behave. ## Probability that the 98,765th decimal digit of \)\pi\( is \)0\(. What is the probability that the 98,765th digit in the decimal expansion of \)\pi\( is \)0\(? The **propensity** and **frequentist** views regard as nonsense the notion that we could talk about the *probability* of a mathematical fact. Either the 98,765th decimal digit of \)\pi\( is \)0\( or it's not. If we're running *repeated* experiments with a random number generator, and looking at different digits of \)\pi,\( then it might make sense to say that the random number generator has a 10% probability of picking numbers whose corresponding decimal digit of \)\pi\( is \)0\(. But if we're just picking a non-random number like 98,765, there's no sense in which we could say that the 98,765th digit of \)\pi\( has a 10% propensity to be \)0\(, or that this digit is \)0\( with 10% frequency in the long run. The **subjectivist** considers probabilities to just refer to their own uncertainty. So if a subjectivist has picked the number 98,765 without yet knowing the corresponding digit of \)\pi,\( and hasn't made any observation that is known to them to be entangled with the 98,765th digit of \)\pi,\( and they're pretty sure their friend hasn't yet looked up the 98,765th digit of \)\pi\( either, and their friend offers a whimsical gamble that costs \$1 if the digit is non-zero and pays \$20 if the digit is zero, the Bayesian takes the bet. Note that this demonstrates a difference between the subjectivist interpretation of "probability" and Bayesian probability theory. A perfect Bayesian reasoner that knows the rules of logic and the definition of \)\pi\( must, by the axioms of probability theory, assign probability either 0 or 1 to the claim "the 98,765th digit of \)\pi\( is a \)0\(" (depending on whether or not it is). This is one of the reasons why [bayes_intractable perfect Bayesian reasoning is intractable]. A subjectivist that is not a perfect Bayesian nevertheless claims that they are personally uncertain about the value of the 98,765th digit of \)\pi.$ For­mal­iz­ing the rules of sub­jec­tive prob­a­bil­ities about math­e­mat­i­cal facts (in the way that prob­a­bil­ity the­ory for­mal­ized the rules for ma­nipu­lat­ing sub­jec­tive prob­a­bil­ities about em­piri­cal facts, such as which way a coin came up) is an open prob­lem; this in known as the prob­lem of log­i­cal un­cer­tainty.


  • Interpretations of "probability"

    What does it mean to say that a fair coin has a 50% prob­a­bil­ity of com­ing up heads?

    • Probability

      The de­gree to which some­one be­lieves some­thing, mea­sured on a scale from 0 to 1, al­low­ing us to do math to it.