Most complex things are not very compressible

Although the halt­ing prob­lem means we can’t prove it doesn’t hap­pen, it would nonethe­less be ex­tremely sur­pris­ing if some 100-state Tur­ing ma­chine turned out to print the ex­act text of Shake­speare’s Romeo and Juliet. Un­less some­thing was speci­fi­cally gen­er­ated by a sim­ple al­gorithm, the Vast su­per­ma­jor­ity of data struc­tures that look like they have high al­gorith­mic com­plex­ity ac­tu­ally do have high al­gorith­mic com­plex­ity. Since there are at most \(2^{101}\) pro­grams that can be speci­fied with at most 100 bits (in any par­tic­u­lar lan­guage), we can’t fit all the 1000-bit data struc­tures into all the 100-bit pro­grams. While Romeo and Juliet is cer­tainly highly com­press­ible, rel­a­tive to most ran­dom bit­strings of the same length, it would be shock­ing for it to com­press all the way down to a 100-state Tur­ing ma­chine. There just aren’t enough 100-state Tur­ing ma­chines for one of their out­puts to be Romeo and Juliet. Similarly, if you start with a 10 kilo­byte text file, and 7zip com­presses it down to 2 kilo­bytes, no amount of time spent try­ing to fur­ther com­press the file us­ing other com­pres­sion pro­grams will ever get that file down to 1 byte. For any given com­pres­sor there’s at most 256 start­ing files that can ever be com­pressed down to 1 byte, and your 10-kilo­byte text file al­most cer­tainly isn’t one of them.

This takes on defen­sive im­por­tance with re­spect to re­fut­ing the prob­a­bil­ity-the­o­retic fal­lacy, “Oh, sure, Oc­cam’s Ra­zor seems to say that this propo­si­tion is com­pli­cated. But how can you be sure that this ap­par­ently com­plex propo­si­tion wouldn’t turn out to be gen­er­ated by some very sim­ple mechanism?” If we con­sider a par­ti­tion of 10,000 pos­si­ble propo­si­tions, col­lec­tively hav­ing a 0.01% prob­a­bil­ity on av­er­age, then all the ar­gu­ments in the world for why var­i­ous propo­si­tions might have un­ex­pect­edly high prob­a­bil­ity, must still add up to an av­er­age prob­a­bil­ity of 0.01%. It can’t be the case that af­ter con­sid­er­ing that propo­si­tion 1 might have se­cretly high prob­a­bil­ity, and con­sid­er­ing that propo­si­tion 2 might have se­cretly high prob­a­bil­ity, and so on, we end up as­sign­ing 5% prob­a­bil­ity to each propo­si­tion, be­cause that would be a to­tal prob­a­bil­ity of 500. If we as­sign prior prob­a­bil­ities us­ing an al­gorith­mic-com­plex­ity Oc­cam prior as in Solomonoff in­duc­tion, then the ob­ser­va­tion that “most ap­par­ently com­plex things can’t be fur­ther com­pressed into an amaz­ingly sim­ple Tur­ing ma­chine”, is the same ob­ser­va­tion as that “most ap­par­ently Oc­cam-pe­nal­ized propo­si­tions can’t turn out to be sim­pler than they look” or “most ap­par­ently sub­jec­tively im­prob­a­ble things can’t turn out to have un­seen clever ar­gu­ments that would val­idly make them more sub­jec­tively prob­a­ble”.


  • Algorithmic complexity

    When you com­press the in­for­ma­tion, what you are left with de­ter­mines the com­plex­ity.