Dependent messages can be encoded cheaply

Say you want to transmit a 2-message, a 4-message, and a 256-message to somebody. For example, you might want to tell them which way a coin came up, a cardinal direction, and a letter (encoded as one byte of ASCII text). How many bits of information does it take to transmit those three messages?

At most, it takes 11 bits: One for the coin, two for the direction, and 8 for the ASCII byte. For example, north, A might be encoded as 0001000001, where 0 means “heads”, 00 means “north”, and 10000001 means “A” in ASCII.

But what if the messages depend on each other? What if the way that the cardinal direction was picked was by looking at the coin (such that you always say north if the coin lands heads, and south if the coin lands tails), and then the letter is picked by looking at the direction (such that you always say A for north, B for east, Z for south, and Y for west). Then how many bits does it take to transmit the message north, A? Only one! Why? Because there are now only two possible messages: north, A and south, Z. Given two people who know the links between the three messages, all you need to tell them is how the coin came up, and they can figure out the entire message.

Formally, if you want to send multiple messages, and those messages share mutual information, then the amount of information it takes to encode all three messages together is less than the amount of information it takes to encode each one separately. (In fact, the amount of information you can save by encoding them both together is at most the amount of mutual information between them).

Looking at the collection of multiple messages as a single message, this fact is an immediate consequence of the fact that some messages being more likely than others means you can develop more efficient encodings for them.

Alternatively, this fact can be seen as a corollary of the fact that intradependent encodings can be compressed: Given three messages \(m_1, m_2, m_3\) and an encoding scheme \(E\), the encoding \(E(m_1)E(m_2)E(m_3)\) made by simply putting all three encodings together can be interpreted as a single intradependent encoding of the triplet \((m_1, m_2, m_3)\).