Dependent messages can be encoded cheaply

Say you want to trans­mit a 2-mes­sage, a 4-mes­sage, and a 256-mes­sage to some­body. For ex­am­ple, you might want to tell them which way a coin came up, a car­di­nal di­rec­tion, and a let­ter (en­coded as one byte of ASCII text). How many bits of in­for­ma­tion does it take to trans­mit those three mes­sages?

At most, it takes 11 bits: One for the coin, two for the di­rec­tion, and 8 for the ASCII byte. For ex­am­ple, north, A might be en­coded as 0001000001, where 0 means “heads”, 00 means “north”, and 10000001 means “A” in ASCII.

But what if the mes­sages de­pend on each other? What if the way that the car­di­nal di­rec­tion was picked was by look­ing at the coin (such that you always say north if the coin lands heads, and south if the coin lands tails), and then the let­ter is picked by look­ing at the di­rec­tion (such that you always say A for north, B for east, Z for south, and Y for west). Then how many bits does it take to trans­mit the mes­sage north, A? Only one! Why? Be­cause there are now only two pos­si­ble mes­sages: north, A and south, Z. Given two peo­ple who know the links be­tween the three mes­sages, all you need to tell them is how the coin came up, and they can figure out the en­tire mes­sage.

For­mally, if you want to send mul­ti­ple mes­sages, and those mes­sages share mu­tual in­for­ma­tion, then the amount of in­for­ma­tion it takes to en­code all three mes­sages to­gether is less than the amount of in­for­ma­tion it takes to en­code each one sep­a­rately. (In fact, the amount of in­for­ma­tion you can save by en­cod­ing them both to­gether is at most the amount of mu­tual in­for­ma­tion be­tween them).

Look­ing at the col­lec­tion of mul­ti­ple mes­sages as a sin­gle mes­sage, this fact is an im­me­di­ate con­se­quence of the fact that some mes­sages be­ing more likely than oth­ers means you can de­velop more effi­cient en­cod­ings for them.

Alter­na­tively, this fact can be seen as a corol­lary of the fact that in­trade­pen­dent en­cod­ings can be com­pressed: Given three mes­sages \(m_1, m_2, m_3\) and an en­cod­ing scheme \(E\), the en­cod­ing \(E(m_1)E(m_2)E(m_3)\) made by sim­ply putting all three en­cod­ings to­gether can be in­ter­preted as a sin­gle in­trade­pen­dent en­cod­ing of the triplet \((m_1, m_2, m_3)\).