Frequency diagrams: A first look at Bayes

Bayesian rea­son­ing is about how to re­vise our be­liefs in the light of ev­i­dence.

We’ll start by con­sid­er­ing one sce­nario in which the strength of the ev­i­dence has clear num­bers at­tached.

(Don’t worry if you don’t know how to solve the fol­low­ing prob­lem. We’ll see shortly how to solve it.)

Sup­pose you are a nurse screen­ing a set of stu­dents for a sick­ness called Dise­a­sitis.noteLiter­ally “in­flam­ma­tion of the dis­ease”.

  • You know, from past pop­u­la­tion stud­ies, that around 20% of the stu­dents will have Dise­a­sitis at this time of year.

You are test­ing for Dise­a­sitis us­ing a color-chang­ing tongue de­pres­sor, which usu­ally turns black if the stu­dent has Dise­a­sitis.

  • Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black.

  • How­ever, the tongue de­pres­sor is not perfect, and also turns black 30% of the time for healthy stu­dents.

One of your stu­dents comes into the office, takes the test, and turns the tongue de­pres­sor black. What is the prob­a­bil­ity that they have Dise­a­sitis?

(If you think you see how to do it, you can try to solve this prob­lem be­fore con­tin­u­ing. To quickly see if you got your an­swer right, you can ex­pand the “An­swer” but­ton be­low; the deriva­tion will be given shortly.)

The prob­a­bil­ity a stu­dent with a black­ened tongue de­pres­sor has Dise­a­sitis is 37, roughly 43
. %

This prob­lem can be solved a hard way or a clever easy way. We’ll walk through the hard way first.

First, we imag­ine a pop­u­la­tion of 100 stu­dents, of whom 20 have Dise­a­sitis and 80 don’t.noteMul­ti­ple stud­ies show that think­ing about con­crete num­bers such as “20 out of 100 stu­dents” or “200 out of 1000 stu­dents” is more likely to pro­duce cor­rect spon­ta­neous rea­son­ing on these prob­lems than think­ing about per­centages like “20% of stu­dents.” E.g. “Prob­a­bil­is­tic rea­son­ing in clini­cal medicine” by David M. Eddy (1982).

prior frequency

90% of sick stu­dents turn their tongue de­pres­sor black, and 30% of healthy stu­dents turn the tongue de­pres­sor black. So we see black tongue de­pres­sors on 90% * 20 = 18 sick stu­dents, and 30% * 80 = 24 healthy stu­dents.

posterior frequency

What’s the prob­a­bil­ity that a stu­dent with a black tongue de­pres­sor has Dise­a­sitis? From the di­a­gram, there are 18 sick stu­dents with black tongue de­pres­sors. 18 + 24 = 42 stu­dents in to­tal turned their tongue de­pres­sors black. Imag­ine reach­ing into a bag of all the stu­dents with black tongue de­pres­sors, and pul­ling out one of those stu­dents at ran­dom; what’s the chance a stu­dent like that is sick?

conditional probability

The fi­nal an­swer is that a pa­tient with a black tongue de­pres­sor has an 1842 = 37 = 43% prob­a­bil­ity of be­ing sick.

Many med­i­cal stu­dents have at first found this an­swer counter-in­tu­itive: The test cor­rectly de­tects Dise­a­sitis 90% of the time! If the test comes back pos­i­tive, why is it still less than 50% likely that the pa­tient has Dise­a­sitis? Well, the test also in­cor­rectly “de­tects” Dise­a­sitis 30% of the time in a healthy pa­tient, and we start out with lots more healthy pa­tients than sick pa­tients.

The test does provide some ev­i­dence in fa­vor of of the pa­tient be­ing sick. The prob­a­bil­ity of a pa­tient be­ing sick goes from 20% be­fore the test, to 43% af­ter we see the tongue de­pres­sor turn black. But this isn’t con­clu­sive, and we need to perform fur­ther tests, maybe more ex­pen­sive ones.

If you feel like you un­der­stand this prob­lem setup, con­sider try­ing to an­swer the fol­low­ing ques­tion be­fore pro­ceed­ing: What’s the prob­a­bil­ity that a stu­dent who does not turn the tongue de­pres­sor black—a stu­dent with a nega­tive test re­sult—has Dise­a­sitis? Again, we start out with 20% sick and 80% healthy stu­dents, 70% of healthy stu­dents will get a nega­tive test re­sult, and only 10% of sick stu­dents will get a nega­tive test re­sult.

Imag­ine 20 sick stu­dents and 80 healthy stu­dents. 10% * 20 = 2 sick stu­dents have nega­tive test re­sults. 70% * 80 = 56 healthy stu­dents have nega­tive test re­sults. Among the 2+56=58 to­tal stu­dents with nega­tive test re­sults, 2 stu­dents are sick stu­dents with nega­tive test re­sults. So 258 = 129 = 3.4% of stu­dents with nega­tive test re­sults have Dise­a­sitis.

if-be­fore(Water­fall di­a­grams and rel­a­tive odds): Now let’s turn to a faster, eas­ier way to solve the same prob­lem.
!if-be­fore(Water­fall di­a­grams and rel­a­tive odds): For a more clever way to perform the same calcu­la­tion, see Water­fall di­a­grams and rel­a­tive odds.

Parents:

  • Frequency diagram

    Vi­su­al­iz­ing Bayes’ rule by ma­nipu­lat­ing fre­quen­cies in large populations

    • Bayes' rule

      Bayes’ rule is the core the­o­rem of prob­a­bil­ity the­ory say­ing how to re­vise our be­liefs when we make a new ob­ser­va­tion.