Fractional bits: Expected cost interpretation

In the GalCom thought ex­per­i­ment, you reg­u­larly have to send large vol­umes of in­for­ma­tion through deep space to your au­to­mated stock trader in the Vega star sys­tem, and ev­ery bit of in­for­ma­tion is very ex­pen­sive. Imag­ine that a very im­por­tant earn­ings re­port is com­ing up for Com­pany X, and you want to trans­mit in­struc­tions to your au­to­mated trader in the Vega sys­tem the in­stant that the in­for­ma­tion be­comes available. You’ve de­cided that you’re ca­pa­ble of dis­t­in­guish­ing be­tween 7 differ­ent out­comes, which each ap­pear equally likely to you. You la­bel them −3, −2, −1, 0, +1, +2, and +3 (ac­cord­ing to how the com­pany performed com­pared to your me­dian es­ti­mate). How many bits do you need to re­serve on GalCom to en­sure that you can send one of those 7 mes­sages the mo­ment that the earn­ings re­port comes out?

\(\log_2(7)\) is about 2.807, which im­plies that two bits won’t do — with two bits, you could only sin­gle out one mes­sage from a set of four. So you have to pur­chase at least 3 bits. (In gen­eral, if you want to send a mes­sage on GalCom that dis­t­in­guishes be­tween \(n\) out­comes, you have to re­serve at least \(\lceil \log_2(n) \rceil\) bits.)

Once you’ve pur­chased 3 bits, there are 8 differ­ent sig­nals that you’ll be able to send via GalCom to the Vega sys­tem: 000, 001, 010, 011, 100, 101, 110, and 111. You can pro­gram your au­to­mated trader to in­ter­pret these sig­nals as −3, −2, −1, 0, +1, +2, and +3 re­spec­tively, and then you have one code — 111 — left over. Is there any way to make money off the fact that you have an ex­tra code?

(You’re in­vited to pause and at­tempt to an­swer this ques­tion your­self. Is there any way to pro­gram the com­puter on Vega to use your 8 codes in such a way that you some­times get to sell one bit back to GalCom?)

You can re­coup some of the losses as fol­lows: Pro­gram your ma­chine on Vega to rec­og­nize both 110 and 111 as +3. Then, if the earn­ings re­port for com­pany X comes out as “very good,” you get to re­sell one bit to GalCom. Why? Be­cause both 110 and 111 will be in­ter­preted as +3, so you don’t care which one gets sent. There­fore, you can choose to send 110 if the next GalCom bit was go­ing to be a 0, and 111 if the next GalCom bit was go­ing to be a 1, and thereby shave one bit off of the next per­son’s mes­sage (as­sum­ing your au­to­mated trader is pro­grammed to for­ward that one bit where it needs to go, which is the sort of thing you can set up ahead of time).

3 bits are enough to sin­gle out a leaf in a bi­nary tree of depth 3. You only have seven mes­sages you want to en­code, which means one of the codes gets dou­bled up — in this case, both 110 and 111 are as­signed to +3. Thus, your com­puter on Vega already knows that the an­swer is +3 by the time GalCom has trans­mit­ted 11, so if the earn­ings re­port comes up +3 you can re­pur­pose the third bit.

Be­cause you set up your codes such that you thought each out­come was equally likely, your ex­pected cost of send­ing the mes­sage is 2.875 bits: 7/​8ths of the time you pay for 3 bits, but 1/​8th of the time you only pay for 2 bits. If you’re send­ing one mes­sage from a set of 7, you have to set aside at least 3 bits at the be­gin­ning, but ev­ery so of­ten you get to sell one of them back.

If you send many, many 7-mes­sages us­ing the en­cod­ing above, then on av­er­age, each 7-mes­sage costs 2.875 bits.

But \(\log_2(7) \neq 2.875,\) it’s ac­tu­ally close to about 2.807. What gives?

In short, the above en­cod­ing is not the most effi­cient way to pack 7-mes­sages. Or, rather, it’s the most effi­cient way to pack one 7-mes­sage in iso­la­tion, but if we have lots of 7-mes­sages, we can do bet­ter on av­er­age. Imag­ine two peo­ple team­ing up to send two 7-mes­sages. Two 7-mes­sages is the same as one 49-mes­sage, where \((m, n)\) is en­coded as \(7m + n,\) i.e., 17 means “2” to the first per­son and “3″ to the sec­ond. Send­ing a 49 mes­sage re­quires re­serv­ing \(\lceil \log_2(49) \rceil = 6\) bits on GalCom, thereby pur­chas­ing 64 differ­ent pos­si­ble sig­nals (000000 through 111111). \(64 - 49 = 15\) of those sig­nals will have a dou­ble mean­ing, which means that 1549 of the time, one bit can be re­sold to GalCom. Thus, the ex­pected cost to send two mes­sages from two sets of seven is \(6 - \frac{15}{49} \approx 5.694\) bits. When the two par­ties split the cost, they each pay only half that, or roughly 2.847 bits in ex­pec­ta­tion — slightly bet­ter than the 2.875 they would have paid send­ing one 7-mes­sage in iso­la­tion.

If you pack three 7-mes­sages to­gether, the ex­pected cost comes out to \((9 - \frac{169}{343})\approx 8.507\) bits, for an av­er­age cost of \(\approx 2.836\) bits per 7-mes­sage: slightly closer still to the \(2.807\) bound. In the limit, as you pack more and more 7-mes­sages to­gether, \(\log_2(7)\) is the small­est av­er­age cost you can get. For more on why this is the lower bound, see How many bits is a trit?, Ex­change rates be­tween digits, and Frac­tional digits.

Thus, we can in­ter­pret “a 7-mes­sage car­ries about 2.807 bits of in­for­ma­tion” as “send­ing lots of 7-mes­sages costs about 2.807 bits on av­er­age, if you pack your mes­sage as clev­erly as pos­si­ble.”

More gen­er­ally, the frac­tional por­tion of the amount of in­for­ma­tion car­ried by a mes­sage can be in­ter­preted as the lower bound for the ex­pected cost of send­ing that mes­sage. If we have to send a sin­gle \(n\)-mes­sage across GalCom in iso­la­tion, then we have to re­serve at least \(\lceil \log_2(n) \rceil\) bits. But if we ap­pend our mes­sage to a longer mes­sage, and pack our mes­sage care­fully, then we can push our ex­pected cost lower and lower. What’s the lower bound on the ex­pected cost? \(\log_2(n).\)

Why is \(\log_2(n)\) the lower bound? The proof has two parts. First, we have to show that you can in fact get closer and closer to an av­er­age cost of \(\log_2(n)\) as you send more and more mes­sages. Se­cond, imag­ine you could send a \(b\)-mes­sage for \(x < \log_2(b)\) bits, on av­er­age. Then you could use \(b\)-mes­sages to en­code lots and lots of 2-mes­sages, which (by the same method) would cost on av­er­age \(\log_b(2) \cdot x\) bits per \(2\)-mes­sage — \(\log_b(2)\) \(b\)-mes­sages per \(2\)-mes­sage, times \(x\) bits per \(b\)-mes­sage. But \(\log_b(2) \cdot \log_2(b) = 1\) for all \(b,\) which means that, if you could send a \(b\)-mes­sage for less than \(\log_2(b)\) bits, then you could use bits to send more than one 2-mes­sage per bit!