Log as the change in the cost of communicating

When in­ter­pret­ing log­a­r­ithms as a gen­er­al­iza­tion of the no­tion of “length” and as digit ex­change rates, in both cases, mul­ti­ply­ing the in­put to the log­a­r­ithm base 10 by a fac­tor of 10 caused the out­put to go up by one. Mul­ti­ply­ing a num­ber by 10 makes it one digit longer. If a digit wheel is worth $1, then a 1000-digit is worth ex­actly $1 more than a 100-digit, be­cause you can build a 1000-digit out of a 100-digit and a 10-digit. Thus, by sym­me­try di­vid­ing an in­put to the log­a­r­ithm base 10 by 10 makes the out­put go down by one: If you di­vide a num­ber by 10, it gets one digit shorter; and any \(n\)-digit is worth $1 more than a \(\frac{n}{10}\)-digit, be­cause you can build an \(n\)-digit out of a \(\frac{n}{10}\)-digit and a 10-digit.

This strongly im­plies that \(\log_{10}(\frac{1}{10})\) should equal \(-1\). If a 1000-digit costs $3, and a 100-digit costs $2, and a 10-digit costs $1, and a 1-digit is worth­less, then, ex­trap­o­lat­ing the pat­tern, a \(\frac{1}{10}\) should cost \(-\$1.\) But what does that mean? What sort of digit is worth nega­tive money? Can we give this ex­trap­o­la­tion a phys­i­cal in­tu­ition?

Yes, we can, by think­ing in terms of how difficult it is to com­mu­ni­cate in­for­ma­tion. Let’s say that you and I are in sep­a­rate rooms, con­nected only by a con­veyor belt, upon which I can place phys­i­cal ob­jects like coins, dice, and digit wheels that you can read. Let’s imag­ine also that a third party is go­ing to show me the who­dunit of a game of Clue, and then let me put some ob­jects on the con­veyor belt, and then send those ob­jects into your room, and then ask you for the in­for­ma­tion. If you can re­pro­duce it suc­cess­fully, then we both win a lot of money. How­ever, I have to pay for ev­ery ob­ject that I put on the con­veyor belt, us­ing the fair prices.

Con­sider how much I have to pay to tell you the re­sult of a clue game. The “who­dunit” in a clue game con­sists of three pieces of in­for­ma­tion:

  1. The name of the mur­derer, which is one of: Miss Scar­lett, Pro­fes­sor Plum, Mrs. Pea­cock, Rev­erend Green, Colonel Mus­tard, or Mrs. White.

  2. The room in which the mur­der oc­curred, which is ei­ther the kitchen, the bal­l­room, the con­ser­va­tory, the din­ing room, the cel­lar, the billiard room, the library, the lounge, the hall, or the study.

  3. The mur­der weapon, which is ei­ther the can­dle­stick, the dag­ger, the lead pipe, poi­son, the re­volver, the rope, or the wrench.

Thus, a typ­i­cal who­dunit might look like “Pro­fes­sor Plum, in the con­ser­va­tory, with the re­volver.” That sen­tence is 55 let­ters long, so one way for me to trans­mit the mes­sage would be to pur­chase fifty five 29-digits (ca­pa­ble of hold­ing any one of 26 let­ters, or a space, or a comma, or a pe­riod), and send you that sen­tence di­rectly. How­ever, that might be a bit ex­ces­sive, as there are in fact only \(6 \cdot 10 \cdot 7 = 420\) differ­ent pos­si­bil­ities (six pos­si­ble mur­der­ers, ten pos­si­ble lo­ca­tions, seven pos­si­ble weapons). As such, I only ac­tu­ally need to buy a 6-digit, a 10-digit, and a 7-digit. Equiv­a­lently, I could pur­chase a sin­gle 420-digit (if such things are on sale). We have to agree in ad­vance what the digits mean — for ex­am­ple, “the 6-digit cor­re­sponds to the mur­derer, in the or­der listed above; the 10-digit cor­re­sponds to the room, in the or­der listed above; the 7-digit cor­re­sponds to the weapon, in the or­der listed above;” but as­sum­ing we do, I can get away with much less than fifty five 29-digits.

Ex­er­cise: If the only stor­age de­vices on sale are coins, how many do I need to buy to com­mu­ni­cate the who­dunit?

Nine. 8 coins only gets you 256 pos­si­bil­ities, and we need at least 420.

Ex­er­cise: If the only stor­age de­vices on sale are dice, how many do I need to buy?

Four. \(6^3 < 420 < 6^4.\)

Ex­er­cise: If I have to choose be­tween all coins or all dice, which should I choose, at the fair prices?

The coins. Four dice cost as much as \(\log_2(6) \* 4 \approx 10.33\) coins, and we can do the job with nine coins in­stead.

Ex­er­cise: If I can mix coins, dice, and digit wheels, what’s the cheap­est way to com­mu­ni­cate the who­dunit?

One coin and three dice let you send the mes­sage at a cost of only \(\log_2(2) + 3\cdot \log_2(6) \approx 8.75\) coins.

Now, con­sider what hap­pens when the third party tells you “Ac­tu­ally, in or­der to win, you also have to com­mu­ni­cate the name of my fa­vorite Clue sus­pect, which is Colonel Mus­tard. I already told the per­son in the other room that you need to com­mu­ni­cate two sus­pects, and that you’ll com­mu­ni­cate my fa­vorite Clue sus­pect sec­ond. I didn’t tell them who my fa­vorite Clue sus­pect was, though.”

Now, the space of pos­si­ble mes­sages has gone up by a fac­tor of six: There are 420 pos­si­ble who­dunits, and each can be paired with one of six pos­si­ble “fa­vorite sus­pects,” for a to­tal of 2520 pos­si­ble mes­sages. How does this im­pact my cost of com­mu­ni­cat­ing with you? My cost goes up by 1 die (\(= \log_2(6)\) coins \(= \log_{10}(6)\) digit wheels). When the space of pos­si­bil­ities goes up by a fac­tor of 6, my costs of com­mu­ni­ca­tion (mea­sured, say, in coins) go up by \(\log_2(6).\)

Now let’s say that the third party comes back in the room and tells you “Ac­tu­ally, I gave the per­son in the other room a logic puz­zle that told them which room the mur­der hap­pened in; they solved it, and now they know that the mur­der hap­pened in the con­ser­va­tory.”

This re­duces the space of pos­si­ble mes­sages I need to send, by a fac­tor of 10. Now that both you and I know that the mur­der hap­pened in the con­ser­va­tory, I only need to trans­mit the mur­derer, the weapon, and the fa­vorite sus­pect — one of 252 pos­si­bil­ities. The space of pos­si­bil­ities was cut into a tenth of its former size, and my cost of com­mu­ni­cat­ing dropped by 1 digit wheel (\(= \log_6(10)\) dice \(= \log_2(10)\) coins).

On this in­ter­pre­ta­tion, log­a­r­ithms are mea­sur­ing how much it costs to trans­mit in­for­ma­tion, in terms of some “base” medium (such as coins, dice, or digit wheels). Every time the space of pos­si­bil­ities in­creases by a fac­tor of \(n\), my com­mu­ni­ca­tion costs in­crease by \(\log_2(n)\) coins. Every time the space of pos­si­bil­ities de­creases by a fac­tor of \(n\), my com­mu­ni­ca­tion costs drop by \(\log_2(n)\) coins.

This is the phys­i­cal in­ter­pre­ta­tion of log­a­r­ithms that you can put your weight on: \(\log_b(x)\) mea­sures how much more or less costly it will be to send a mes­sage (in terms of \(b\)-digits) when the space of pos­si­ble mes­sages changes by a fac­tor of \(x\). Paired with a phys­i­cal in­ter­pre­ta­tion of frac­tional digits, it can ex­plain most of the ba­sic prop­er­ties of the log­a­r­ithm:

  1. \(\log_b(1) = 0,\) be­cause in­creas­ing (or de­creas­ing) the space of pos­si­ble mes­sages by a fac­tor of 1 doesn’t af­fect your com­mu­ni­ca­tion costs at all.

  2. \(\log_b(b) = 1,\) be­cause in­creas­ing the space of pos­si­ble mes­sages by a fac­tor of \(b\) will in­crease your com­mu­ni­ca­tion costs by ex­actly one \(b\)-digit.

  3. \(\log_b\left(\frac{1}{b}\right) = -1,\) be­cause de­creas­ing the space of pos­si­ble mes­sages by a fac­tor of \(b\) saves you one \(b\)-digit worth of com­mu­ni­ca­tion costs.

  4. \(\log_b(x\cdot y) = \log_b(x) + \log_b(y),\) be­cause if \(n = x \cdot y\) then one \(n\)-digit is ex­actly large enough to store one \(x\)-mes­sage and one \(y\)-mes­sage. Thus, when com­mu­ni­cat­ing, an \(x\cdot y\)-digit is worth the same amount as one \(x\)-digit plus one \(y\)-digit.

  5. \(\log_b(x^n) = n \cdot log_b(x),\) be­cause \(n\) \(x\)-digits can be used to em­u­late one \(x^n\)-digit.

You might be think­ing to your­self:

Wait, what does it mean for the space of pos­si­ble mes­sages to go up or down by a fac­tor of \(x\)? This isn’t always clear. What if you’re re­ally good at guess­ing who peo­ple’s fa­vorite sus­pect is? For that mat­ter, what if we haven’t es­tab­lished a con­ven­tion like “0 = Miss Scar­lett; 1 = Pro­fes­sor Plum; …”? If I see an ob­ser­va­tion, the amount by which it changes the space of pos­si­ble mes­sages is sub­jec­tive; it de­pends on my be­liefs and on the be­liefs of the per­son I’m com­mu­ni­cat­ing with and on the con­ven­tions that we set up be­fore­hand. How do you ac­tu­ally for­mal­ize this idea?

Those are great ques­tions. Down that path lies In­for­ma­tion the­ory, a field which mea­sures com­mu­ni­ca­tion costs us­ing log­a­r­ithms, and which lets us for­mal­ize (and quan­tize) ideas such as the amount of in­for­ma­tion car­ried by a mes­sage (to a given ob­server). See the in­for­ma­tion the­ory tu­to­rial for more on this sub­ject.

With re­gard to log­a­r­ithms, the key idea here is an in­ter­pre­ta­tion of what \(\log_b(x)\) is “re­ally do­ing.” Given an in­put like “how many pos­si­ble mes­sages are there,” such that your costs go up by 1 unit ev­ery time the in­put space in­creases by a fac­tor of \(b\), \(\log_b(x)\) mea­sures the change in cost when the in­put space in­creases by a fac­tor of \(x\). As we will see next, this idea gen­er­al­izes be­yond the do­main of “set of pos­si­ble mes­sages vs cost of com­mu­ni­cat­ing,” to any sce­nario where some mea­sure \(\mu\) in­creases by \(1\) ev­ery time some ob­ject scales by a fac­tor of \(b\), in which case \(\log_b(x)\) mea­sures the change in \(\mu\) when the ob­ject scales by a fac­tor of \(x\). This is the defin­ing char­ac­ter­is­tic of log­a­r­ithms, and now that we have some solid phys­i­cal in­ter­pre­ta­tions of what it means, we’re ready to start ex­plor­ing log­a­r­ithms in the ab­stract.