Likelihood

Con­sider a piece of ev­i­dence \(e,\) such as “Mr. Boddy was shot.” We might have a num­ber of differ­ent hy­pothe­ses that ex­plain this ev­i­dence, in­clud­ing \(H_S\) = “Miss Scar­lett kil­led him”, \(H_M\) = “Colonel Mus­tard kil­led him”, and so on.

Each of those hy­pothe­ses as­signs a differ­ent prob­a­bil­ity to the ev­i­dence. For ex­am­ple, imag­ine that if Miss Scar­lett were the kil­ler, there’s a 20% chance she would use a gun, and an 80% chance she’d use some other weapon. In this case, the “Miss Scar­lett” hy­poth­e­sis as­signs a like­li­hood of 20% to \(e.\)

When rea­son­ing about differ­ent hy­pothe­ses us­ing a prob­a­bil­ity dis­tri­bu­tion \(\mathbb P\), the like­li­hood of ev­i­dence \(e\) given hy­poth­e­sis \(H_i\) is of­ten writ­ten us­ing the con­di­tional prob­a­bil­ity \(\mathbb P(e \mid H_i).\) When re­port­ing like­li­hoods of many differ­ent hy­pothe­ses at once, it is com­mon to use a like­li­hood func­tion, some­times writ­ten \(\mathcal L_e(H_i)\).

Rel­a­tive like­li­hoods mea­sure the de­gree of sup­port that a piece of ev­i­dence \(e\) pro­vides for differ­ent hy­pothe­ses. For ex­am­ple, let’s say that if Colonel Mus­tard were the kil­ler, there’s a 40% chance he would use a gun. Then the ab­solute like­li­hoods of \(H_S\) and \(H_M\) are 20% and 40%, for rel­a­tive like­li­hoods of (1 : 2). This says that the ev­i­dence \(e\) sup­ports \(H_M\) twice as much as it sup­ports \(H_S,\) and that the amount of sup­port would have been the same if the ab­solute like­li­hoods were 2% and 4% in­stead.

Ac­cord­ing to Bayes’ rule, rel­a­tive like­li­hoods are the ap­pro­pri­ate tool for mea­sur­ing the strength of a given piece ev­i­dence. Rel­a­tive like­li­hoods are one of two key con­stituents of be­lief in Bayesian rea­son­ing, the other be­ing prior prob­a­bil­ities.

While ab­solute like­li­hoods aren’t nec­es­sary when up­dat­ing be­liefs by Bayes’ rule, they are use­ful when check­ing for con­fu­sion. For ex­am­ple, say you have a coin and only two hy­pothe­ses about how it works: \(H_{0.3}\) = “the coin is ran­dom and comes up heads 30% of the time”, and \(H_{0.9}\) = “the coin is ran­dom and comes up heads 90% of the time.” Now let’s say you toss the coin 100 times, and ob­serve the data HTHTHTHTHTHTHTHT… (al­ter­nat­ing heads and tails). The rel­a­tive like­li­hoods strongly fa­vor \(H_{0.3},\) be­cause it was less wrong. How­ever, the ab­solute like­li­hood of \(H_{0.3}\) will be much lower than ex­pected, and this deficit is a hint that \(H_{0.3}\) isn’t right. (For more on this idea, see Strictly con­fused.)

Children:

Parents:

  • Bayesian reasoning

    A prob­a­bil­ity-the­ory-based view of the world; a co­her­ent way of chang­ing prob­a­bil­is­tic be­liefs based on ev­i­dence.