Most complex things are not very compressible

Although the halting problem means we can’t prove it doesn’t happen, it would nonetheless be extremely surprising if some 100-state Turing machine turned out to print the exact text of Shakespeare’s Romeo and Juliet. Unless something was specifically generated by a simple algorithm, the Vast supermajority of data structures that look like they have high algorithmic complexity actually do have high algorithmic complexity. Since there are at most \(2^{101}\) programs that can be specified with at most 100 bits (in any particular language), we can’t fit all the 1000-bit data structures into all the 100-bit programs. While Romeo and Juliet is certainly highly compressible, relative to most random bitstrings of the same length, it would be shocking for it to compress all the way down to a 100-state Turing machine. There just aren’t enough 100-state Turing machines for one of their outputs to be Romeo and Juliet. Similarly, if you start with a 10 kilobyte text file, and 7zip compresses it down to 2 kilobytes, no amount of time spent trying to further compress the file using other compression programs will ever get that file down to 1 byte. For any given compressor there’s at most 256 starting files that can ever be compressed down to 1 byte, and your 10-kilobyte text file almost certainly isn’t one of them.

This takes on defensive importance with respect to refuting the probability-theoretic fallacy, “Oh, sure, Occam’s Razor seems to say that this proposition is complicated. But how can you be sure that this apparently complex proposition wouldn’t turn out to be generated by some very simple mechanism?” If we consider a partition of 10,000 possible propositions, collectively having a 0.01% probability on average, then all the arguments in the world for why various propositions might have unexpectedly high probability, must still add up to an average probability of 0.01%. It can’t be the case that after considering that proposition 1 might have secretly high probability, and considering that proposition 2 might have secretly high probability, and so on, we end up assigning 5% probability to each proposition, because that would be a total probability of 500. If we assign prior probabilities using an algorithmic-complexity Occam prior as in Solomonoff induction, then the observation that “most apparently complex things can’t be further compressed into an amazingly simple Turing machine”, is the same observation as that “most apparently Occam-penalized propositions can’t turn out to be simpler than they look” or “most apparently subjectively improbable things can’t turn out to have unseen clever arguments that would validly make them more subjectively probable”.

Parents: