Log as generalized length

Here are a hand­ful of ex­am­ples of how the log­a­r­ithm base 10 be­haves. Can you spot the pat­tern?

$$ \begin{align} \log_{10}(2) &\ \approx 0.30 \\ \log_{10}(7) &\ \approx 0.85 \\ \log_{10}(22) &\ \approx 1.34 \\ \log_{10}(70) &\ \approx 1.85 \\ \log_{10}(139) &\ \approx 2.14 \\ \log_{10}(316) &\ \approx 2.50 \\ \log_{10}(123456) &\ \approx 5.09 \\ \log_{10}(654321) &\ \approx 5.82 \\ \log_{10}(123456789) &\ \approx 8.09 \\ \log_{10}(\underbrace{987654321}_\text{9 digits}) &\ \approx 8.99 \end{align} $$

Every time the in­put gets one digit longer, the out­put goes up by one. In other words, the out­put of the log­a­r­ithm is roughly the length — mea­sured in digits — of the in­put. (Why?)

Why is it the log base 10 (rather than, say, the log base 2) that roughly mea­sures the length of a num­ber? Be­cause num­bers are nor­mally rep­re­sented in dec­i­mal no­ta­tion, where each new digit lets you write down ten times as many num­bers. The log­a­r­ithm base 2 would mea­sure the length of a num­ber if each digit only gave you the abil­ity to write down twice as many num­bers. In other words, the log base 2 of a num­ber is roughly the length of that num­ber when it’s rep­re­sented in bi­nary no­ta­tion (where \(13\) is writ­ten \(\texttt{1101}\) and so on):

$$ \begin{align} \log_2(3) = \log_2(\texttt{11}) &\ \approx 1.58 \\ \log_2(7) = \log_2(\texttt{111}) &\ \approx 2.81 \\ \log_2(13) = \log_2(\texttt{1101}) &\ \approx 3.70 \\ \log_2(22) = \log_2(\texttt{10110}) &\ \approx 4.46 \\ \log_2(70) = \log_2(\texttt{1010001}) &\ \approx 6.13 \\ \log_2(139) = \log_2(\texttt{10001011}) &\ \approx 7.12 \\ \log_2(316) = \log_2(\texttt{1100101010}) &\ \approx 8.30 \\ \log_2(1000) = \log_2(\underbrace{\texttt{1111101000}}_\text{10 digits}) &\ \approx 9.97 \end{align} $$

If you aren’t fa­mil­iar with the idea of num­bers rep­re­sented in other num­ber bases be­sides 10, and you want to learn more, see the num­ber base tu­to­rial.

Here’s an in­ter­ac­tive vi­su­al­iza­tion which shows the link be­tween the length of a num­ber ex­pressed in base \(b\), and the log­a­r­ithm base \(b\) of that num­ber:

As you can see, if \(b\) is an in­te­ger greater than 1, then the log­a­r­ithm base \(b\) of \(x\) is pretty close to the num­ber of digits it takes to write \(x\) in base \(b.\)

Pretty close, but not ex­actly. The most ob­vi­ous differ­ence is that the out­puts of log­a­r­ithms gen­er­ally have a frac­tional por­tion: the log­a­r­ithm of \(x\) always falls a lit­tle short of the length of \(x.\) This is be­cause, in­so­far as log­a­r­ithms act like the “length” func­tion, they gen­er­al­ize the no­tion of length, mak­ing it con­tin­u­ous.

What does this frac­tional por­tion mean? Roughly speak­ing, log­a­r­ithms mea­sure not only how long a num­ber is, but also how much that num­ber is re­ally us­ing its digits. 12 and 99 are both two-digit num­bers, but in­tu­itively, 12 is “barely” two digits long, whereas 97 is “nearly” three digits. Log­a­r­ithms for­mal­ize this in­tu­ition, and tell us that 12 is re­ally only us­ing about 1.08 digits, while 97 is us­ing about 1.99.

Where are these frac­tions com­ing from? Also, look­ing at the ex­am­ples above, no­tice that \(\log_{10}(316) \approx 2.5.\) Why is it 316, rather than 500, that log­a­r­ithms claim is “2.5 digits long”? What would it even mean for a num­ber to be 2.5 digits long? It very clearly takes 3 digits to write down “316,” namely, ‘3’, ‘1’, and ‘6’. What would it mean for a num­ber to use “half a digit”?

Well, here’s one way to ap­proach the no­tion of a “par­tial digit.” Let’s say that you work in a ware­house record­ing data us­ing digit wheels like they used to have on old desk­top com­put­ers.

A digit wheel

Let’s say that one of your digit wheels is bro­ken, and can’t hold num­bers greater than 4 — ev­ery notch 5-9 has been stripped off, so if you try to set it to a num­ber be­tween 5 and 9, it just slips down to 4. Let’s call the re­sult­ing digit a 5-digit, be­cause it can still be sta­bly placed into 5 differ­ent states (0-4). We could eas­ily call this 5-digit a “par­tial 10-digit.”

The ques­tion is, how much of a par­tial 10-digit is it? Is it half a 10-digit, be­cause it can store 5 out of 10 val­ues that a “full 10-digit” can store? That would be a fine way to mea­sure frac­tional digits, but it’s not the method used by log­a­r­ithms. Why? Well, con­sider a sce­nario where you have to record lots and lots of num­bers on these digits (such that you can tell some­one how to read off the right data later), and let’s say also that you have to pay me one dol­lar for ev­ery digit that you use. Now let’s say that I only charge you 50¢ per 5-digit. Then you should do all your work in 5-digits! Why? Be­cause two 5-digits can be used to store 25 differ­ent val­ues (00, 01, 02, 03, 04, 10, 11, …, 44) for $1, which is way more data-stored-per-dol­lar than you would have got­ten by buy­ing a 10-digit.noteYou may be won­der­ing, are two 5-digits re­ally worth more than one 10-digit? Sure, you can place them in 25 differ­ent con­figu­ra­tions, but how do you en­code “9″ when none of the digits have a “9” sym­bol writ­ten on them? If so, see The sym­bols don’t mat­ter.

In other words, there’s a nat­u­ral ex­change rate be­tween \(n\)-digits, and a 5-digit is worth more than half a 10-digit. (The ac­tual price you’d be will­ing to pay is a bit short of 70¢ per 5-digit, for rea­sons that we’ll ex­plore shortly). A 4-digit is also worth a bit more than half a 10-digit (two 4-digits lets you store 16 differ­ent num­bers), and a 3-digit is worth a bit less than half a 10-digit (two 3-digits let you store only 9 differ­ent num­bers).

We now be­gin to see what the frac­tional an­swer that comes out of a log­a­r­ithm ac­tu­ally means (and why 300 is closer to 2.5 digits long that 500 is). The log­a­r­ithm base 10 of \(x\) is not an­swer­ing “how many 10-digits does it take to store \(x\)?” It’s an­swer­ing “how many digits-of-var­i­ous-kinds does it take to store \(x\), where as many digits as pos­si­ble are 10-digits; and how big does the fi­nal digit have to be?” The frac­tional por­tion of the out­put de­scribes how large the fi­nal digit has to be, us­ing this nat­u­ral ex­change rate be­tween digits of differ­ent sizes.

For ex­am­ple, the num­ber 200 can be stored us­ing only two 10-digits and one 2-digit.\(\log_{10}(200) \approx 2.301,\) and a 2-digit is worth about 0.301 10-digits. In fact, a 2-digit is worth ex­actly \((\log_{10}(200) - 2)\) 10-digits. As an­other ex­am­ple, \(\log_{10}(500) \approx 2.7\) means “to record 500, you need two 10-digits, and also a digit worth at least \(\approx\)70¢”, i.e., two 10-digits and a 5-digit.

This raises a num­ber of ad­di­tional ques­tions:

Ques­tion: Wait, there is no digit that’s worth 50¢. As you said, a 3-digit is worth less than half a 10-digit (be­cause two 3-digits can only store 9 things), and a 4-digit is worth more than half a 10-digit (be­cause two 4-digits store 16 things). If \(\log_{10}(316) \approx 2.5\) means “you need two 10-digits and a digit worth at least 50¢,” then why not just have the \(\log_{10}\) of ev­ery­thing be­tween 301 and 400 be 2.60? They’re all go­ing to need two 10-digits and a 4-digit, aren’t they?

An­swer: The nat­u­ral ex­change rates be­tween digits is ac­tu­ally way more in­ter­est­ing than it first ap­pears. If you’re try­ing to store ei­ther “301” or “400“, and you start with two 10-digits, then you have to pur­chase a 4-digit in both cases. But if you start with a 10-digit and an 8-digit, then the digit you need to buy is differ­ent in the two cases. In the “301” case you can still make do with a 4-digit, be­cause the 10, 8, and 4-digits to­gether give you the abil­ity to store any num­ber up to \(10\cdot 8\cdot 4 = 320\). But in the “400” case you now need to pur­chase a 5-digit in­stead, be­cause the 10, 8, and 4 digits to­gether aren’t enough. The log­a­r­ithm of a num­ber tells you about ev­ery com­bi­na­tion of \(n\)-digits that would work to en­code the num­ber (and more!). This is an idea that we’ll ex­plore over the next few pages, and it will lead us to a much bet­ter un­der­stand­ing of log­a­r­ithms.

Ques­tion: Hold on, where did the 2.60 num­ber come from above? How did you know that a 5-digit costs 70¢? How are you calcu­lat­ing these ex­change rates, and what do they mean?

An­swer: Good ques­tion. In Ex­change rates be­tween digits, we’ll ex­plore what the nat­u­ral ex­change rate be­tween digits is, and why.

Ques­tion: \(\log_{10}(100)=2,\) but clearly, 100 is 3 digits long. In fact, \(\log_b(b^k)=k\) for any in­te­gers \(b\) and \(k\), but \(k+1\) digits are re­quired to rep­re­sent \(b^k\) in base \(b\) (as a one fol­lowed by \(k\) ze­roes). Why is the log­a­r­ithm mak­ing these off-by-one er­rors?

An­swer: Se­cretly, the log­a­r­ithm of \(x\) isn’t an­swer­ing the ques­tion “how hard is it to write \(x\) down?”, it’s an­swer­ing some­thing more like “how many digits does it take to record a whole num­ber less than \(x\)?” In other words, the \(\log_{10}\) of 100 is the num­ber of 10-digits you need to be able to name any one of a hun­dred num­bers, and that’s two digits (which can hold any­thing from 00 to 99).

Ques­tion: Wait, but what about when the in­put has a frac­tional por­tion? How long is the num­ber 100.87? And also, \(\log_{10}(100.87249072)\) is just a hair higher than 2, but 100.87249072 is way harder to write down that 100. How can you say that their “lengths” are al­most the same?

An­swer: Great ques­tions! The length in­ter­pre­ta­tion on its own doesn’t shed any light on how log­a­r­ithm func­tions han­dle frac­tional in­puts. We’ll soon de­velop a sec­ond in­ter­pre­ta­tion of log­a­r­ithms which does ex­plain the be­hav­ior on frac­tional in­puts, but we aren’t there yet.

Mean­while, note that the ques­tion “how hard is it to write down an in­te­ger be­tween 0 and \(x\) us­ing digits?” is very differ­ent from the ques­tion of “how hard is it to write down \(x\)”? For ex­am­ple, 3 is easy to write down us­ing digits, while \(\pi\) is very difficult to write down us­ing digits. Nev­er­the­less, the log of \(\pi\) is very close to the log of 3. The con­cept for “how hard is this num­ber to write down?” goes by the name of com­plex­ity; see the Kol­mogorov com­plex­ity tu­to­rial to learn more on this topic.

Ques­tion: Speak­ing of frac­tional in­puts, if \(0 < x < 1\) then the log­a­r­ithm of \(x\) is nega­tive. How does that square with the length in­ter­pre­ta­tion? What would it even mean for the length of the num­ber \(\frac{1}{10}\) to be \(-1\)?

An­swer: Nice catch! The length in­ter­pre­ta­tion crashes and burns when the in­puts are less than one.

The “log­a­r­ithms mea­sure length” in­ter­pre­ta­tion is im­perfect. The con­nec­tion is still use­ful to un­der­stand, be­cause you already have an in­tu­ition for how slowly the length of a num­ber grows as the num­ber gets larger. The “length” in­ter­pre­ta­tion is one of the eas­iest ways to get a gut-level in­tu­ition for what log­a­r­ith­mic growth means. If some­one says “the amount of time it takes to search my database is log­a­r­ith­mic in the num­ber of en­tries,” you can get a sense for what this means by re­mem­ber­ing that log­a­r­ith­mic growth is like how the length of a num­ber grows with the mag­ni­tude of that num­ber:

The in­ter­pre­ta­tion doesn’t ex­plain what’s go­ing on when the in­put is frac­tional, but it’s still one of the fastest ways to make log­a­r­ithms start feel­ing like a nat­u­ral prop­erty on num­bers, rather than just some es­o­teric func­tion that “in­verts ex­po­nen­tials.” Length is the quick-and-dirty in­tu­ition be­hind log­a­r­ithms.

For ex­am­ple, I don’t know what the log­a­r­ithm base 10 of 2,310,426 is, but I know it’s be­tween 6 and 7, be­cause 2,310,426 is seven digits long.

$$\underbrace{\text{2,310,426}}_\text{7 digits}$$

In fact, I can also tell you that \(\log_{10}(\text{2,310,426})\) is be­tween 6.30 and 6.48. How? Well, I know it takes six 10-digits to get up to 1,000,000, and then we need some­thing more than a 2-digit and less than a 3-digit to get to a num­ber be­tween 2 and 3 mil­lion. The nat­u­ral ex­change rates for 2-digits and 3-digits (in terms of 10-digits) are 30¢ and 48¢ re­spec­tively, so the cost of 2,310,426 in terms of 10-digits is be­tween $6.30 and $6.48.

Next up, we’ll be ex­plor­ing this idea of an ex­change rate be­tween differ­ent types of digits, and build­ing an even bet­ter in­ter­pre­ta­tion of log­a­r­ithms which helps us un­der­stand what they’re do­ing on frac­tional in­puts (and why).