Probability interpretations: Examples

Bet­ting on one-time events

Con­sider eval­u­at­ing, in June of 2016, the ques­tion: “What is the prob­a­bil­ity of Hillary Clin­ton win­ning the 2016 US pres­i­den­tial elec­tion?”

On the propen­sity view, Hillary has some fun­da­men­tal chance of win­ning the elec­tion. To ask about the prob­a­bil­ity is to ask about this ob­jec­tive chance. If we see a pre­dic­tion mar­ket in which prices move af­ter each new poll — so that it says 60% one day, and 80% a week later — then clearly the pre­dic­tion mar­ket isn’t giv­ing us very strong in­for­ma­tion about this ob­jec­tive chance, since it doesn’t seem very likely that Clin­ton’s real chance of win­ning is swing­ing so rapidly.

On the fre­quen­tist view, we can­not for­mally or rigor­ously say any­thing about the 2016 pres­i­den­tial elec­tion, be­cause it only hap­pens once. We can’t ob­serve a fre­quency with which Clin­ton wins pres­i­den­tial elec­tions. A fre­quen­tist might con­cede that they would cheer­fully buy for $1 a ticket that pays $20 if Clin­ton wins, con­sid­er­ing this a fa­vor­able bet in an in­for­mal sense, while in­sist­ing that this sort of rea­son­ing isn’t suffi­ciently rigor­ous, and there­fore isn’t suit­able for be­ing in­cluded in sci­ence jour­nals.

On the sub­jec­tive view, say­ing that Hillary has an 80% chance of win­ning the elec­tion sum­ma­rizes our knowl­edge about the elec­tion or our state of un­cer­tainty given what we cur­rently know. It makes sense for the pre­dic­tion mar­ket prices to change in re­sponse to new polls, be­cause our cur­rent state of knowl­edge is chang­ing.

A coin with an un­known bias

Sup­pose we have a coin, weighted so that it lands heads some­where be­tween 0% and 100% of the time, but we don’t know the coin’s ac­tual bias.

The coin is then flipped three times where we can see it. It comes up heads twice, and tails once: HHT.

The coin is then flipped again, where no­body can see it yet. An hon­est and trust­wor­thy ex­per­i­menter lets you spin a wheel-of-gam­bling-odds,noteThe rea­son for spin­ning the wheel-of-gam­bling-odds is to re­duce the worry that the ex­per­i­menter might know more about the coin than you, and be offer­ing you a de­liber­ately rigged bet. and the wheel lands on (2 : 1). The ex­per­i­menter asks if you’d en­ter into a gam­ble where you win $2 if the un­seen coin flip is tails, and pay $1 if the un­seen coin flip is heads.

On a propen­sity view, the coin has some ob­jec­tive prob­a­bil­ity be­tween 0 and 1 of be­ing heads, but we just don’t know what this prob­a­bil­ity is. See­ing HHT tells us that the coin isn’t all-heads or all-tails, but we’re still just guess­ing — we don’t re­ally know the an­swer, and can’t say whether the bet is a fair bet.

On a fre­quen­tist view, the coin would (if flipped re­peat­edly) pro­duce some long-run fre­quency \(f\) of heads that is be­tween 0 and 1. If we kept flip­ping the coin long enough, the ac­tual pro­por­tion \(p\) of ob­served heads is guaran­teed to ap­proach \(f\) ar­bi­trar­ily closely, even­tu­ally. We can’t say that the next coin flip is guaran­teed to be H or T, but we can make an ob­jec­tively true state­ment that \(p\) will ap­proach \(f\) to within ep­silon if we con­tinue to flip the coin long enough.

To de­cide whether or not to take the bet, a fre­quen­tist might try to ap­ply an un­bi­ased es­ti­ma­tor to the data we have so far. An “un­bi­ased es­ti­ma­tor” is a rule for tak­ing an ob­ser­va­tion and pro­duc­ing an es­ti­mate \(e\) of \(f\), such that the ex­pected value of \(e\) is \(f\). In other words, a fre­quen­tist wants a rule such that, if the hid­den bias of the coin was in fact to yield 75% heads, and we re­peat many times the op­er­a­tion of flip­ping the coin a few times and then ask­ing a new fre­quen­tist to es­ti­mate the coin’s bias us­ing this rule, the av­er­age value of the es­ti­mated bias will be 0.75. This is a prop­erty of the es­ti­ma­tion rule which is ob­jec­tive. We can’t hope for a rule that will always, in any par­tic­u­lar case, yield the true \(f\) from just a few coin flips; but we can have a rule which will prov­ably have an av­er­age es­ti­mate of \(f\), if the ex­per­i­ment is re­peated many times.

In this case, a sim­ple un­bi­ased es­ti­ma­tor is to guess that the coin’s bias \(f\) is equal to the ob­served pro­por­tion of heads, or 23. In other words, if we re­peat this ex­per­i­ment many many times, and when­ever we see \(p\) heads in 3 tosses we guess that the coin’s bias is \(\frac{p}{3}\), then this rule definitely is an un­bi­ased es­ti­ma­tor. This es­ti­ma­tor says that a bet of $2 vs. $\1 is fair, mean­ing that it doesn’t yield an ex­pected profit, so we have no rea­son to take the bet.

On a sub­jec­tivist view, we start out per­son­ally un­sure of where the bias \(f\) lies within the in­ter­val 1. Un­less we have any knowl­edge or sus­pi­cion lead­ing us to think oth­er­wise, the coin is just as likely to have a bias be­tween 33% and 34%, as to have a bias be­tween 66% and 67%; there’s no rea­son to think it’s more likely to be in one range or the other.

Each coin flip we see is then ev­i­dence about the value of \(f,\) since a flip H hap­pens with differ­ent prob­a­bil­ities de­pend­ing on the differ­ent val­ues of \(f,\) and we up­date our be­liefs about \(f\) us­ing Bayes’ rule. For ex­am­ple, H is twice as likely if \(f=\frac{2}{3}\) than if \(f=\frac{1}{3}\) so by Bayes’s Rule we should now think \(f\) is twice as likely to lie near \(\frac{2}{3}\) as it is to lie near \(\frac{1}{3}\).

When we start with a uniform prior, ob­serve mul­ti­ple flips of a coin with an un­known bias, see M heads and N tails, and then try to es­ti­mate the odds of the next flip com­ing up heads, the re­sult is Laplace’s Rule of Suc­ces­sion which es­ti­mates (M + 1) : (N + 1) for a prob­a­bil­ity of \(\frac{M + 1}{M + N + 2}.\)

In this case, af­ter ob­serv­ing HHT, we es­ti­mate odds of 2 : 3 for tails vs. heads on the next flip. This makes a gam­ble that wins $2 on tails and loses $1 on heads a prof­itable gam­ble in ex­pec­ta­tion, so we take the bet.

Our choice of a uniform prior over \(f\) was a lit­tle du­bi­ous — it’s the ob­vi­ous way to ex­press to­tal ig­no­rance about the bias of the coin, but ob­vi­ous­ness isn’t ev­ery­thing. (For ex­am­ple, maybe we ac­tu­ally be­lieve that a fair coin is more likely than a coin bi­ased 50.0000023% to­wards heads.) How­ever, all the rea­son­ing af­ter the choice of prior was rigor­ous ac­cord­ing to the laws of prob­a­bil­ity the­ory, which is the only method of ma­nipu­lat­ing quan­tified un­cer­tainty that obeys ob­vi­ous-seem­ing rules about how sub­jec­tive un­cer­tainty should be­have.

Prob­a­bil­ity that the 98,765th dec­i­mal digit of \(\pi\) is \(0\).

What is the prob­a­bil­ity that the 98,765th digit in the dec­i­mal ex­pan­sion of \(\pi\) is \(0\)?

The propen­sity and fre­quen­tist views re­gard as non­sense the no­tion that we could talk about the prob­a­bil­ity of a math­e­mat­i­cal fact. Either the 98,765th dec­i­mal digit of \(\pi\) is \(0\) or it’s not. If we’re run­ning re­peated ex­per­i­ments with a ran­dom num­ber gen­er­a­tor, and look­ing at differ­ent digits of \(\pi,\) then it might make sense to say that the ran­dom num­ber gen­er­a­tor has a 10% prob­a­bil­ity of pick­ing num­bers whose cor­re­spond­ing dec­i­mal digit of \(\pi\) is \(0\). But if we’re just pick­ing a non-ran­dom num­ber like 98,765, there’s no sense in which we could say that the 98,765th digit of \(\pi\) has a 10% propen­sity to be \(0\), or that this digit is \(0\) with 10% fre­quency in the long run.

The sub­jec­tivist con­sid­ers prob­a­bil­ities to just re­fer to their own un­cer­tainty. So if a sub­jec­tivist has picked the num­ber 98,765 with­out yet know­ing the cor­re­spond­ing digit of \(\pi,\) and hasn’t made any ob­ser­va­tion that is known to them to be en­tan­gled with the 98,765th digit of \(\pi,\) and they’re pretty sure their friend hasn’t yet looked up the 98,765th digit of \(\pi\) ei­ther, and their friend offers a whim­si­cal gam­ble that costs $1 if the digit is non-zero and pays $20 if the digit is zero, the Bayesian takes the bet.

Note that this demon­strates a differ­ence be­tween the sub­jec­tivist in­ter­pre­ta­tion of “prob­a­bil­ity” and Bayesian prob­a­bil­ity the­ory. A perfect Bayesian rea­soner that knows the rules of logic and the defi­ni­tion of \(\pi\) must, by the ax­ioms of prob­a­bil­ity the­ory, as­sign prob­a­bil­ity ei­ther 0 or 1 to the claim “the 98,765th digit of \(\pi\) is a \(0\)” (de­pend­ing on whether or not it is). This is one of the rea­sons why perfect Bayesian rea­son­ing is in­tractable. A sub­jec­tivist that is not a perfect Bayesian nev­er­the­less claims that they are per­son­ally un­cer­tain about the value of the 98,765th digit of \(\pi.\) For­mal­iz­ing the rules of sub­jec­tive prob­a­bil­ities about math­e­mat­i­cal facts (in the way that prob­a­bil­ity the­ory for­mal­ized the rules for ma­nipu­lat­ing sub­jec­tive prob­a­bil­ities about em­piri­cal facts, such as which way a coin came up) is an open prob­lem; this in known as the prob­lem of log­i­cal un­cer­tainty.