Log as the change in the cost of communicating

When interpreting logarithms as a generalization of the notion of “length” and as digit exchange rates, in both cases, multiplying the input to the logarithm base 10 by a factor of 10 caused the output to go up by one. Multiplying a number by 10 makes it one digit longer. If a digit wheel is worth $1, then a 1000-digit is worth exactly $1 more than a 100-digit, because you can build a 1000-digit out of a 100-digit and a 10-digit. Thus, by symmetry dividing an input to the logarithm base 10 by 10 makes the output go down by one: If you divide a number by 10, it gets one digit shorter; and any \(n\)-digit is worth $1 more than a \(\frac{n}{10}\)-digit, because you can build an \(n\)-digit out of a \(\frac{n}{10}\)-digit and a 10-digit.

This strongly implies that \(\log_{10}(\frac{1}{10})\) should equal \(-1\). If a 1000-digit costs $3, and a 100-digit costs $2, and a 10-digit costs $1, and a 1-digit is worthless, then, extrapolating the pattern, a \(\frac{1}{10}\) should cost \(-\$1.\) But what does that mean? What sort of digit is worth negative money? Can we give this extrapolation a physical intuition?

Yes, we can, by thinking in terms of how difficult it is to communicate information. Let’s say that you and I are in separate rooms, connected only by a conveyor belt, upon which I can place physical objects like coins, dice, and digit wheels that you can read. Let’s imagine also that a third party is going to show me the whodunit of a game of Clue, and then let me put some objects on the conveyor belt, and then send those objects into your room, and then ask you for the information. If you can reproduce it successfully, then we both win a lot of money. However, I have to pay for every object that I put on the conveyor belt, using the fair prices.

Consider how much I have to pay to tell you the result of a clue game. The “whodunit” in a clue game consists of three pieces of information:

  1. The name of the murderer, which is one of: Miss Scarlett, Professor Plum, Mrs. Peacock, Reverend Green, Colonel Mustard, or Mrs. White.

  2. The room in which the murder occurred, which is either the kitchen, the ballroom, the conservatory, the dining room, the cellar, the billiard room, the library, the lounge, the hall, or the study.

  3. The murder weapon, which is either the candlestick, the dagger, the lead pipe, poison, the revolver, the rope, or the wrench.

Thus, a typical whodunit might look like “Professor Plum, in the conservatory, with the revolver.” That sentence is 55 letters long, so one way for me to transmit the message would be to purchase fifty five 29-digits (capable of holding any one of 26 letters, or a space, or a comma, or a period), and send you that sentence directly. However, that might be a bit excessive, as there are in fact only \(6 \cdot 10 \cdot 7 = 420\) different possibilities (six possible murderers, ten possible locations, seven possible weapons). As such, I only actually need to buy a 6-digit, a 10-digit, and a 7-digit. Equivalently, I could purchase a single 420-digit (if such things are on sale). We have to agree in advance what the digits mean — for example, “the 6-digit corresponds to the murderer, in the order listed above; the 10-digit corresponds to the room, in the order listed above; the 7-digit corresponds to the weapon, in the order listed above;” but assuming we do, I can get away with much less than fifty five 29-digits.

Exercise: If the only storage devices on sale are coins, how many do I need to buy to communicate the whodunit?

Nine. 8 coins only gets you 256 possibilities, and we need at least 420.

Exercise: If the only storage devices on sale are dice, how many do I need to buy?

Four. \(6^3 < 420 < 6^4.\)

Exercise: If I have to choose between all coins or all dice, which should I choose, at the fair prices?

The coins. Four dice cost as much as \(\log_2(6) \* 4 \approx 10.33\) coins, and we can do the job with nine coins instead.

Exercise: If I can mix coins, dice, and digit wheels, what’s the cheapest way to communicate the whodunit?

One coin and three dice let you send the message at a cost of only \(\log_2(2) + 3\cdot \log_2(6) \approx 8.75\) coins.

Now, consider what happens when the third party tells you “Actually, in order to win, you also have to communicate the name of my favorite Clue suspect, which is Colonel Mustard. I already told the person in the other room that you need to communicate two suspects, and that you’ll communicate my favorite Clue suspect second. I didn’t tell them who my favorite Clue suspect was, though.”

Now, the space of possible messages has gone up by a factor of six: There are 420 possible whodunits, and each can be paired with one of six possible “favorite suspects,” for a total of 2520 possible messages. How does this impact my cost of communicating with you? My cost goes up by 1 die ($= \log_2(6)$ coins \(= \log_{10}(6)\) digit wheels). When the space of possibilities goes up by a factor of 6, my costs of communication (measured, say, in coins) go up by \(\log_2(6).\)

Now let’s say that the third party comes back in the room and tells you “Actually, I gave the person in the other room a logic puzzle that told them which room the murder happened in; they solved it, and now they know that the murder happened in the conservatory.”

This reduces the space of possible messages I need to send, by a factor of 10. Now that both you and I know that the murder happened in the conservatory, I only need to transmit the murderer, the weapon, and the favorite suspect — one of 252 possibilities. The space of possibilities was cut into a tenth of its former size, and my cost of communicating dropped by 1 digit wheel ($= \log_6(10)$ dice \(= \log_2(10)\) coins).

On this interpretation, logarithms are measuring how much it costs to transmit information, in terms of some “base” medium (such as coins, dice, or digit wheels). Every time the space of possibilities increases by a factor of \(n\), my communication costs increase by \(\log_2(n)\) coins. Every time the space of possibilities decreases by a factor of \(n\), my communication costs drop by \(\log_2(n)\) coins.

This is the physical interpretation of logarithms that you can put your weight on: \(\log_b(x)\) measures how much more or less costly it will be to send a message (in terms of \(b\)-digits) when the space of possible messages changes by a factor of \(x\). Paired with a physical interpretation of fractional digits, it can explain most of the basic properties of the logarithm:

  1. \(\log_b(1) = 0,\) because increasing (or decreasing) the space of possible messages by a factor of 1 doesn’t affect your communication costs at all.

  2. \(\log_b(b) = 1,\) because increasing the space of possible messages by a factor of \(b\) will increase your communication costs by exactly one \(b\)-digit.

  3. \(\log_b\left(\frac{1}{b}\right) = -1,\) because decreasing the space of possible messages by a factor of \(b\) saves you one \(b\)-digit worth of communication costs.

  4. \(\log_b(x\cdot y) = \log_b(x) + \log_b(y),\) because if \(n = x \cdot y\) then one \(n\)-digit is exactly large enough to store one \(x\)-message and one \(y\)-message. Thus, when communicating, an \(x\cdot y\)-digit is worth the same amount as one \(x\)-digit plus one \(y\)-digit.

  5. \(\log_b(x^n) = n \cdot log_b(x),\) because \(n\) \(x\)-digits can be used to emulate one \(x^n\)-digit.

You might be thinking to yourself:

Wait, what does it mean for the space of possible messages to go up or down by a factor of \(x\)? This isn’t always clear. What if you’re really good at guessing who people’s favorite suspect is? For that matter, what if we haven’t established a convention like “0 = Miss Scarlett; 1 = Professor Plum; …”? If I see an observation, the amount by which it changes the space of possible messages is subjective; it depends on my beliefs and on the beliefs of the person I’m communicating with and on the conventions that we set up beforehand. How do you actually formalize this idea?

Those are great questions. Down that path lies Information theory, a field which measures communication costs using logarithms, and which lets us formalize (and quantize) ideas such as the amount of information carried by a message (to a given observer). See the information theory tutorial for more on this subject.

With regard to logarithms, the key idea here is an interpretation of what \(\log_b(x)\) is “really doing.” Given an input like “how many possible messages are there,” such that your costs go up by 1 unit every time the input space increases by a factor of \(b\), \(\log_b(x)\) measures the change in cost when the input space increases by a factor of \(x\). As we will see next, this idea generalizes beyond the domain of “set of possible messages vs cost of communicating,” to any scenario where some measure \(\mu\) increases by \(1\) every time some object scales by a factor of \(b\), in which case \(\log_b(x)\) measures the change in \(\mu\) when the object scales by a factor of \(x\). This is the defining characteristic of logarithms, and now that we have some solid physical interpretations of what it means, we’re ready to start exploring logarithms in the abstract.

Parents: