Most complex things are not very compressible

Eliezer Yudkowsky27 Jun 2016 2:06 UTC

Although the halting problem means we can’t prove it doesn’t happen, it would nonetheless be extremely surprising if some 100-state Turing machine turned out to print the exact text of Shakespeare’s Romeo and Juliet. Unless something was specifically generated by a simple algorithm, the Vast supermajority of data structures that look like they have high algorithmic complexity actually do have high algorithmic complexity. Since there are at most \(2^{101}\) programs that can be specified with at most 100 bits (in any particular language), we can’t fit all the 1000-bit data structures into all the 100-bit programs. While Romeo and Juliet is certainly highly compressible, relative to most random bitstrings of the same length, it would be shocking for it to compress all the way down to a 100-state Turing machine. There just aren’t enough 100-state Turing machines for one of their outputs to be Romeo and Juliet. Similarly, if you start with a 10 kilobyte text file, and 7zip compresses it down to 2 kilobytes, no amount of time spent trying to further compress the file using other compression programs will ever get that file down to 1 byte. For any given compressor there’s at most 256 starting files that can ever be compressed down to 1 byte, and your 10-kilobyte text file almost certainly isn’t one of them.

This takes on defensive importance with respect to refuting the probability-theoretic fallacy, “Oh, sure, Occam’s Razor seems to say that this proposition is complicated. But how can you be sure that this apparently complex proposition wouldn’t turn out to be generated by some very simple mechanism?” If we consider a partition of 10,000 possible propositions, collectively having a 0.01% probability on average, then all the arguments in the world for why various propositions might have unexpectedly high probability, must still add up to an average probability of 0.01%. It can’t be the case that after considering that proposition 1 might have secretly high probability, and considering that proposition 2 might have secretly high probability, and so on, we end up assigning 5% probability to each proposition, because that would be a total probability of 500. If we assign prior probabilities using an algorithmic-complexity Occam prior as in Solomonoff induction, then the observation that “most apparently complex things can’t be further compressed into an amazingly simple Turing machine”, is the same observation as that “most apparently Occam-penalized propositions can’t turn out to be simpler than they look” or “most apparently subjectively improbable things can’t turn out to have unseen clever arguments that would validly make them more subjectively probable”.

Eliezer Yudkowsky27 Jun 2016 2:06 UTC

Parents:

Algorithmic complexity
When you compress the information, what you are left with determines the complexity.

Eric Rogstad 27 Jun 2016 5:20 UTC
Not 2^100?
- Eliezer Yudkowsky 27 Jun 2016 17:37 UTC
  2^100 + 2^99 + 2^98… + 1 = 2^101 − 1.
- Eric Rogstad 1 Jul 2016 23:32 UTC
  Is there a difference between any particular 99-bit program and a 100-bit program with a 0 as the first bit?
- Eliezer Yudkowsky 14 Jul 2016 18:46 UTC
  Yes. That includes both the case where the length is specified outside the program, and the case where we use a prefix-free encoding. Actually I’m not sure what you’re asking here—why wouldn’t prepending 0 to a program change its behavior in whatever Universal Turing Machine you used? If the first bit always has to be 1, it might as well be omitted.
- Eric Rogstad 14 Jul 2016 21:15 UTC
  Aren’t programs for Turing machines specified as marks on an infinite tape?
  
  I was interpreting 100-bit program as one where up to 100 cells on the tape have marks in them (and there’s only one kind of mark, so a cell can only have a mark or not). Maybe I’ve got the wrong picture though. I haven’t studied Turing machines in much depth.
Kaya Fallenstein 14 Jul 2016 9:28 UTC
“Similarly, if you start with a 10 kilobyte text file, and 7zip compresses it down to 2 kilobytes, no amount of time spent trying to further compress the file using other compression programs will ever get that file down to 1 byte.” (Emphasis added.)

This claim seems too strong. Do you mean that an arbitrary compression algorithm is unlikely to compress the file to one byte?

Taking what’s written at face value, if I assume that every compression I apply monotonically decreases the size of the file, then I need only find a large enough sequence of compressions and compose them to compress my file to one byte. This is of course silly, since the composition is just a huge highly specialized compression program optimized for this instance, but I can spend enough time to make it happen.
- Eliezer Yudkowsky 14 Jul 2016 18:48 UTC
  No individual compressor can monotonically decrease file sizes.
  
  And we count the string of .rar.7z.zip.rar… in your filename into the total size of your file.
- Kaya Fallenstein 15 Jul 2016 8:41 UTC
  I only assumed the sequence was monotonic.
  
  The second comment is fair. I think when I first read this, I ignored the bit about 7zip and understood the sentence as claiming “no Turing machine takes an input smaller than one byte and prints out a 10kb file”.