# Waterfall diagrams and relative odds

Imag­ine a wa­ter­fall with two streams of wa­ter at the top, a red stream and a blue stream. Th­ese streams sep­a­rately ap­proach the top of the wa­ter­fall, with some of the wa­ter from both streams be­ing di­verted along the way, and the re­main­ing wa­ter fal­ling into a shared pool be­low.

Sup­pose that:

• At the top of the wa­ter­fall, 20 gal­lons/​sec­ond of red wa­ter are flow­ing down, and 80 gal­lons/​sec­ond of blue wa­ter are com­ing down.

• 90% of the red wa­ter makes it to the bot­tom.

• 30% of the blue wa­ter makes it to the bot­tom.

Of the pur­plish wa­ter that makes it to the bot­tom of the pool, how much was origi­nally from the red stream and how much was origi­nally from the blue stream?

if-af­ter(Fre­quency di­a­grams: A first look at Bayes): This is struc­turally iden­ti­cal to the Dise­a­sitis prob­lem from be­fore:

• 20% of the pa­tients in the screen­ing pop­u­la­tion start out with Dise­a­sitis.

• Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black.

• 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black. <div>

!if-af­ter(Fre­quency di­a­grams: A first look at Bayes): This is struc­turally similar to the fol­low­ing prob­lem, such as med­i­cal stu­dents might en­counter:

You are a nurse screen­ing 100 pa­tients for Dise­a­sitis, us­ing a tongue de­pres­sor which usu­ally turns black for pa­tients who have the sick­ness.

• 20% of the pa­tients in the screen­ing pop­u­la­tion start out with Dise­a­sitis.

• Among pa­tients with Dise­a­sitis, 90% turn the tongue de­pres­sor black (true pos­i­tives).

• How­ever, 30% of the pa­tients with­out Dise­a­sitis will also turn the tongue de­pres­sor black (false pos­i­tives).

What is the chance that a pa­tient with a black­ened tongue de­pres­sor has Dise­a­sitis? <div>

The 20% of sick pa­tients are analo­gous to the 20 gal­lons/​sec­ond of red wa­ter; the 80% of healthy pa­tients are analo­gous to the 80 gal­lons/​sec­ond of blue wa­ter:

The 90% of the sick pa­tients turn­ing the tongue de­pres­sor black is analo­gous to 90% of the red wa­ter mak­ing it to the bot­tom of the wa­ter­fall. 30% of the healthy pa­tients turn­ing the tongue de­pres­sor black is analo­gous to 30% of the blue wa­ter mak­ing it to the bot­tom pool.

There­fore, the ques­tion “what por­tion of wa­ter in the fi­nal pool came from the red stream?” has the same an­swer as the ques­tion “what por­tion of pa­tients that turn the tongue de­pres­sor black are sick with Dise­a­sitis?”

if-af­ter(Fre­quency di­a­grams: A first look at Bayes): Now for the faster way of an­swer­ing that ques­tion.

We start with 4 times as much blue wa­ter as red wa­ter at the top of the wa­ter­fall.

Then each molecule of red wa­ter is 90% likely to make it to the shared pool, and each molecule of blue wa­ter is 30% likely to make it to the pool. (90% of red wa­ter and 30% of blue wa­ter make it to the bot­tom.) So each molecule of red wa­ter is 3 times as likely (0.90 /​ 0.30 = 3) as a molecule of blue wa­ter to make it to the bot­tom.

So we mul­ti­ply prior pro­por­tions of $$1 : 4$$ for red vs. blue by rel­a­tive like­li­hoods of $$3 : 1$$ and end up with fi­nal pro­por­tions of $$(1 \cdot 3) : (4 \cdot 1) = 3 : 4$$, mean­ing that the bot­tom pool has 3 parts of red wa­ter to 4 parts of blue wa­ter.

To con­vert these rel­a­tive pro­por­tions into an ab­solute prob­a­bil­ity that a ran­dom wa­ter molecule at the bot­tom is red, we calcu­late 3 /​ (3 + 4) to see that 3/​7ths (roughly 43%) of the wa­ter in the shared pool came from the red stream.

This pro­por­tion is the same as the 18 : 24 sick pa­tients with pos­i­tive re­sults, ver­sus healthy pa­tients with pos­i­tive test re­sults, that we would get by think­ing about 100 pa­tients.

That is, to solve the Dise­a­sitis prob­lem in your head, you could con­vert this word prob­lem:

20% of the pa­tients in a screen­ing pop­u­la­tion have Dise­a­sitis. 90% of the pa­tients with Dise­a­sitis turn the tongue de­pres­sor black, and 30% of the pa­tients with­out Dise­a­sitis turn the tongue de­pres­sor black. Given that a pa­tient turned their tongue de­pres­sor black, what is the prob­a­bil­ity that they have Dise­a­sitis?

Into this calcu­la­tion:

Okay, so the ini­tial odds are (20% : 80%) = (1 : 4), and the like­li­hoods are (90% : 30%) = (3 : 1). Mul­ti­ply­ing those ra­tios gives fi­nal odds of (3 : 4), which con­verts to a prob­a­bil­ity of 3/​7ths.

(You might not be able to con­vert 37 to 43% in your head, but you might be able to eye­ball that it was a chunk less than 50%.)

You can try do­ing a similar calcu­la­tion for this prob­lem:

• 90% of wid­gets are good and 10% are bad.

• 12% of bad wid­gets emit sparks.

• Only 4% of good wid­gets emit sparks.

What per­centage of spark­ing wid­gets are bad? If you are suffi­ciently com­fortable with the setup, try do­ing this prob­lem en­tirely in your head.

(You might try vi­su­al­iz­ing a wa­ter­fall with good and bad wid­gets at the top, and only spark­ing wid­gets mak­ing it to the bot­tom pool.)

todo: Have a pic­ture of a wa­ter­fall here, with no num­bers, but with the parts la­beled, that can be ex­panded if the user wants to ex­pand it.

• There’s (1 : 9) bad vs. good wid­gets.

• Bad vs. good wid­gets have a (12 : 4) rel­a­tive like­li­hood to spark.

• This sim­plifies to (1 : 9) x (3 : 1) = (3 : 9) = (1 : 3), 1 bad spark­ing wid­get for ev­ery 3 good spark­ing wid­gets.

• Which con­verts to a prob­a­bil­ity of 1/​(1+3) = 14 = 25%; that is, 25% of spark­ing wid­gets are bad.

See­ing sparks didn’t make us “be­lieve the wid­get is bad”; the prob­a­bil­ity only went to 25%, which is less than 5050. But this doesn’t mean we say, “I still be­lieve this wid­get is good!” and toss out the ev­i­dence and ig­nore it. A bad wid­get is rel­a­tively more likely to emit sparks, and there­fore see­ing this ev­i­dence should cause us to think it rel­a­tively more likely that the wid­get is a bad one, even if the prob­a­bil­ity hasn’t yet gone over 50%. We in­crease our prob­a­bil­ity from 10% to 25%.<div><div>

if-be­fore(In­tro­duc­tion to Bayes’ rule: Odds form): Water­falls are one way of vi­su­al­iz­ing the “odds form” of “Bayes’ rule”, which states that the prior odds times the like­li­hood ra­tio equals the pos­te­rior odds. In turn, this rule can be seen as for­mal­iz­ing the no­tion of “the strength of ev­i­dence” or “how much a piece of ev­i­dence should make us up­date our be­liefs”. We’ll take a look at this more gen­eral form next.

!if-be­fore(In­tro­duc­tion to Bayes’ rule: Odds form): Water­falls are one way of vi­su­al­iz­ing the odds form of Bayes’ rule, which states that the prior odds times the like­li­hood ra­tio equals the pos­te­rior odds.

Parents:

• Waterfall diagram

Vi­su­al­iz­ing Bayes’ rule as the mix­ing of prob­a­bil­ity streams.

• I think iso­mor­phic is too ad­vanced vo­cab­u­lary to be as­sumed for Math 1. Would this be a good op­por­tu­nity to use a popover with the defi­ni­tion?

• Agree. Could be re­placed with “similar” or “similar in form”. The sen­tence could also be change to say some­thing like “This prob­lem is just like . . .”

• Do we want cita­tion needed norms on Ar­bital?

(At a higher level, do we want read­ers to be able to flag por­tions of a page with a va­ri­ety of la­bels, such as, un­clear, ap­pears to be fac­tu­ally in­cor­rect, con­tra­dic­tory, etc?)

• This text is out of sync with the graphic—the pic ac­tu­ally shows black tongue de­pres­sors.

• I liked this ex­pla­na­tion. In par­tic­u­lar, the ob­vi­ous hard way vs sneaky easy way con­trast caught my at­ten­tion.

Per­haps that could even serve as an in­tro­duc­tory mo­ti­vat­ing sen­tence? (e.g. “In this post we’ll ex­plore an ob­vi­ous hard way and also a sneaky easy way to do calcu­la­tions us­ing Bayes’s Rule.”)

• Word­ing seem less clear then it could be here, what does it mean to say it “pro­duces bet­ter prob­lem-solv­ing.” What about some­thing like:

. . . that par­ti­ci­pants ar­rive at the cor­rect an­swer more of­ten when the prob­lems is pre­sented in terms of fre­quen­cies, 20 pa­tients, rather then prob­a­bil­ities, 20% of pa­tients.”

• This sen­tence should be writ­ten above the pre­vi­ous para­graph: 1824 is 34, not 37.

• It should be clar­ified that “the bot­tom” here refers to the pool.

• I think it’d be clearer to have two differ­ent head­ers. The way it’s set up right now, I didn’t ini­tially see that this one ar­ti­cle is talk­ing about two differ­ent (but re­lated) ap­proaches.

• Ah, in­sight­ful! I hadn’t seen forms of Bayes’ Rule other than the prob­a­bil­ity form be­fore to­day, and this is very helpful (well, per­haps I had seen them but it hasn’t “hit me” un­til now).

I like that this is em­pha­sized. To fur­ther em­pha­size, I think a for­mula should be added as a block level el­e­ment un­der­neath.

• 90% of the red wa­ter makes it to the shared pool. 30% of the blue wa­ter makes it to the shared pool.

• Ques­tion of in­ter­est.

• An­swer of in­ter­est.

• How did it con­vert to 3/​7th is un­clear.

• I don’t un­der­stand how the wa­ter­fall con­cept helps illus­trate the “odds form”: the amount of each type of wa­ter reach­ing the pool is still ex­pressed as a prob­a­bil­ity rather than jointly be­ing ex­pressed as the like­li­hood ra­tio. The fact that these like­li­hoods don’t mat­ter—only their ra­tio—was the the crit­i­cal con­cep­tual block­age for me.

• “Likely” refers to prob­a­bil­ity, and yet the point of this es­say is to ex­plain prob­a­bil­ity. There­fore, the use of “likely” is, in a sense, cir­cu­lar rea­son­ing. After all, what does “likely” mean? It’s not ex­plained here. It sug­gests an out­come fre­quency of sorts and so this state­ment and oth­ers like it is an at­tempt to ar­rive at an out­come fre­quency (equiv­a­lent to the pro­por­tions of red and blue wa­ter that make it down through) by refer­ring to an­other out­come fre­quency; thus the cir­cu­lar­ity.

Bet­ter to stick with the pro­por­tions them­selves by ex­plain­ing that, how­ever much red wa­ter makes it down through, there will be three times as much of it as there is blue wa­ter that makes it down through. Say that some frac­tion, f, of the blue wa­ter molecules makes it down through; then for ev­ery 100 molecules of wa­ter, f x 80 blue molecules make it down through and 3f x 20 red molecules make it down through, mak­ing for pro­por­tions of 60f red to 80f blue. Scal­ing down those pro­por­tions by di­vid­ing both by f, we get 60:80, which can be fur­ther scaled down to 3:4.

Note that the fac­tor of 3, i.e. the “like­li­hood ra­tio” (by which the ini­tial pro­por­tions of 20:80 are mul­ti­plied) is ex­plicit in the pre­vi­ous para­graph. (It’s in the state­ment, “3f x 20 red molecules make it down through”.) Put­ting it an­other way, the pre­vi­ous para­graph makes it clear that mul­ti­ply­ing by 3 will give the same fi­nal pro­por­tions (“pos­te­rior odds”) as will, in tak­ing a fre­quency ap­proach, mul­ti­ply­ing 20 by 0.9 and 80 by 0.3, since the lat­ter pro­por­tions can be scaled by di­vid­ing each by 0.3: (0.9/​0.3 x 20):(0.3/​0.3 x 80) = (3 x 20):1 x 80 = 3:4.

• has to be 18:42. 42 is the sum of 18 and 24 ( these are the pro­por­tions of wa­ter).

• I’m failing to grasp how the prob­a­bil­ity con­ver­sion works and so some fur­ther ex­pla­na­tion may be needed

• The in­verse of mul­ti­pli­ca­tion is di­vi­sion. To the math­e­mat­i­cally stead­fast this is com­pletely ob­vi­ous but I wa­ger this is ex­actly the point where most non-math­e­mat­i­cally in­clined peo­ple will be­come con­fused and give up or will sim­ply read on with­out ab­sorb­ing the whole mes­sage. Maybe make this math­e­mat­i­cal step more clearly?

• I can fol­low the calcu­la­tion of dis­e­a­sitis—that’s stan­dard math that I learned in school. What I have a prob­lem to fol­low is how you get to the “ab­solute propa­bil­ity” of 3 /​ (3 + 4). I think the “3+4″ are the 3 parts red wa­ter and 4 parts blue wa­ter, but where does the other 3 come from? Wait … is that again the 3 parts red? So 3 Parts of 7 parts in all? Hm … I think I have solved my ques­tion ;-)