Likelihood

Consider a piece of evidence \(e,\) such as “Mr. Boddy was shot.” We might have a number of different hypotheses that explain this evidence, including \(H_S\) = “Miss Scarlett killed him”, \(H_M\) = “Colonel Mustard killed him”, and so on.

Each of those hypotheses assigns a different probability to the evidence. For example, imagine that if Miss Scarlett were the killer, there’s a 20% chance she would use a gun, and an 80% chance she’d use some other weapon. In this case, the “Miss Scarlett” hypothesis assigns a likelihood of 20% to \(e.\)

When reasoning about different hypotheses using a probability distribution \(\mathbb P\), the likelihood of evidence \(e\) given hypothesis \(H_i\) is often written using the conditional probability \(\mathbb P(e \mid H_i).\) When reporting likelihoods of many different hypotheses at once, it is common to use a likelihood function, sometimes written \(\mathcal L_e(H_i)\).

Relative likelihoods measure the degree of support that a piece of evidence \(e\) provides for different hypotheses. For example, let’s say that if Colonel Mustard were the killer, there’s a 40% chance he would use a gun. Then the absolute likelihoods of \(H_S\) and \(H_M\) are 20% and 40%, for relative likelihoods of (1 : 2). This says that the evidence \(e\) supports \(H_M\) twice as much as it supports \(H_S,\) and that the amount of support would have been the same if the absolute likelihoods were 2% and 4% instead.

According to Bayes’ rule, relative likelihoods are the appropriate tool for measuring the strength of a given piece evidence. Relative likelihoods are one of two key constituents of belief in Bayesian reasoning, the other being prior probabilities.

While absolute likelihoods aren’t necessary when updating beliefs by Bayes’ rule, they are useful when checking for confusion. For example, say you have a coin and only two hypotheses about how it works: \(H_{0.3}\) = “the coin is random and comes up heads 30% of the time”, and \(H_{0.9}\) = “the coin is random and comes up heads 90% of the time.” Now let’s say you toss the coin 100 times, and observe the data HTHTHTHTHTHTHTHT… (alternating heads and tails). The relative likelihoods strongly favor \(H_{0.3},\) because it was less wrong. However, the absolute likelihood of \(H_{0.3}\) will be much lower than expected, and this deficit is a hint that \(H_{0.3}\) isn’t right. (For more on this idea, see Strictly confused.)