Complexity of value


“Com­plex­ity of value” is the idea that if you tried to write an AI that would do right things (or max­i­mally right things, or ad­e­quately right things) with­out fur­ther look­ing at hu­mans (so it can’t take in a flood of ad­di­tional data from hu­man ad­vice, the AI has to be com­plete as it stands once you’re finished cre­at­ing it), the AI’s prefer­ences or util­ity func­tion would need to con­tain a large amount of data (al­gorith­mic com­plex­ity). Con­versely, if you try to write an AI that di­rectly wants sim­ple things or try to spec­ify the AI’s prefer­ences us­ing a small amount of data or code, it won’t do ac­cept­ably right things in our uni­verse.

Com­plex­ity of value says, “There’s no sim­ple and non-meta solu­tion to AI prefer­ences” or “The things we want AIs to want are com­pli­cated in the Kol­mogorov-com­plex­ity sense” or “Any sim­ple goal you try to de­scribe that is All We Need To Pro­gram Into AIs is al­most cer­tainly wrong.”

Com­plex­ity of value is a fur­ther idea above and be­yond the or­thog­o­nal­ity the­sis which states that AIs don’t au­to­mat­i­cally do the right thing and that we can have, e.g., pa­per­clip max­i­miz­ers. Even if we ac­cept that pa­per­clip max­i­miz­ers are pos­si­ble, and sim­ple and non­forced, this wouldn’t yet im­ply that it’s very difficult to make AIs that do the right thing. If the right thing is very sim­ple to en­code—if there are value op­ti­miz­ers that are scarcely more com­plex than di­a­mond max­i­miz­ers—then it might not be es­pe­cially hard to build a nice AI even if not all AIs are nice. Com­plex­ity of Value is the fur­ther propo­si­tion that says, no, this is forsee­ably quite hard—not be­cause AIs have ‘nat­u­ral’ anti-nice de­sires, but be­cause nice­ness re­quires a lot of work to spec­ify.

Frankena’s list

As an in­tu­ition pump for the com­plex­ity of value the­sis, con­sider William Frankena’s list of things which many cul­tures and peo­ple seem to value (for their own sake rather than their ex­ter­nal con­se­quences):

“Life, con­scious­ness, and ac­tivity; health and strength; plea­sures and satis­fac­tions of all or cer­tain kinds; hap­piness, beat­i­tude, con­tent­ment, etc.; truth; knowl­edge and true opinions of var­i­ous kinds, un­der­stand­ing, wis­dom; beauty, har­mony, pro­por­tion in ob­jects con­tem­plated; aes­thetic ex­pe­rience; morally good dis­po­si­tions or virtues; mu­tual af­fec­tion, love, friend­ship, co­op­er­a­tion; just dis­tri­bu­tion of goods and evils; har­mony and pro­por­tion in one’s own life; power and ex­pe­riences of achieve­ment; self-ex­pres­sion; free­dom; peace, se­cu­rity; ad­ven­ture and nov­elty; and good rep­u­ta­tion, honor, es­teem, etc.”

When we try to list out prop­er­ties of a hu­man or galac­tic fu­ture that seem like they’d be very nice, we at least seem to value a fair num­ber of things that aren’t re­ducible to each other. (What ini­tially look like plau­si­ble-sound­ing “But you do A to get B” ar­gu­ments usu­ally fall apart when we look for third al­ter­na­tives to do­ing A to get B. Marginally adding some free­dom can marginally in­crease the hap­piness of a hu­man, so a hap­piness op­ti­mizer that can only ex­ert a small push to­ward free­dom might choose to do so. That doesn’t mean that a pure, pow­er­ful hap­piness max­i­mizer would in­stru­men­tally op­ti­mize free­dom. If an agent cares about hap­piness but not free­dom, the out­come that max­i­mizes their prefer­ences is a large num­ber of brains set to max­i­mum hap­piness. When we don’t just seize on one pos­si­ble case where a B-op­ti­mizer might use A as a strat­egy, but in­stead look for fur­ther C-strate­gies that might max­i­mize B even bet­ter than A, then the at­tempt to re­duce A to an in­stru­men­tal B-max­i­miza­tion strat­egy of­ten falls apart. It’s in this sense that the items on Frankena’s list don’t seem to re­duce to each other as a mat­ter of pure prefer­ence, even though hu­mans in ev­ery­day life of­ten seem to pur­sue sev­eral of the goals at the same time.

Com­plex­ity of value says that, in this case, the way things seem is the way they are: Frankena’s list is not en­cod­able in one page of Python code. This propo­si­tion can’t be es­tab­lished definitely with­out set­tling on a suffi­ciently well-speci­fied metaethics, such as re­flec­tive equil­ibrium, to make it clear that there is in­deed no a pri­ori rea­son for nor­ma­tivity to be al­gorith­mi­cally sim­ple. But the ba­sic in­tu­ition for Com­plex­ity of Value is pro­vided just by the fact that Frankena’s list was more than one item long, and that many in­di­vi­d­ual terms don’t seem likely to have al­gorith­mi­cally sim­ple defi­ni­tions that dis­t­in­guish their valuable from non-valuable forms.

Lack of a cen­tral core

We can un­der­stand the idea of com­plex­ity of value by con­trast­ing it to the situ­a­tion with re­spect to epistemic rea­son­ing aka truth-find­ing or an­swer­ing sim­ple fac­tual ques­tions about the world. In an ideal sense, we can try to com­press and re­duce the idea of map­ping the world well down to al­gorith­mi­cally sim­ple no­tions like “Oc­cam’s Ra­zor” and “Bayesian up­dat­ing”. In a prac­ti­cal sense, nat­u­ral se­lec­tion, in the course of op­ti­miz­ing hu­mans to solve fac­tual ques­tions like “Where can I find a tree with fruit?” or “Are brightly col­ored snakes usu­ally poi­sonous?” or “Who’s plot­ting against me?”, ended up with enough of the cen­tral core of episte­mol­ogy that hu­mans were later able to an­swer ques­tions like “How are the planets mov­ing?” or “What hap­pens if I fire this rocket?”, even though hu­mans hadn’t been ex­plic­itly se­lected on to an­swer those ex­act ques­tions.

Be­cause episte­mol­ogy does have a cen­tral core of sim­plic­ity and Bayesian up­dat­ing, se­lect­ing for an or­ganism that got some pretty com­pli­cated epistemic ques­tions right enough to re­pro­duce, also caused that or­ganism to start un­der­stand­ing things like Gen­eral Rel­a­tivity. When it comes to truth-find­ing, we’d ex­pect by de­fault for the same thing to be true about an Ar­tifi­cial In­tel­li­gence; if you build it to get epistem­i­cally cor­rect an­swers on lots of widely differ­ent prob­lems, it will con­tain a core of truth­find­ing and start get­ting epistem­i­cally cor­rect an­swers on lots of other prob­lems—even prob­lems com­pletely differ­ent from your train­ing set, the way that hu­mans un­der­stand­ing Gen­eral Rel­a­tivity wasn’t like any hunter-gath­erer prob­lem.

The com­plex­ity of value the­sis is that there isn’t a sim­ple core to nor­ma­tivity, which means that if you hone your AI to do nor­ma­tively good things on A, B, and C and then con­front the AI with very differ­ent prob­lem D, the AI may do the wrong thing on D. There’s a large num­ber of in­de­pen­dent ideal “gears” in­side the com­plex ma­chin­ery of value, com­pared to episte­mol­ogy that in prin­ci­ple might only con­tain “pre­fer sim­pler hy­pothe­ses” and “pre­fer hy­pothe­ses that match the ev­i­dence”.

The Orthog­o­nal­ity Th­e­sis says that, con­tra to the in­tu­ition that max­i­miz­ing pa­per­clips feels “stupid”, you can have ar­bi­trar­ily cog­ni­tively pow­er­ful en­tities that max­i­mize pa­per­clips, or ar­bi­trar­ily com­pli­cated other goals. So while in­tu­itively you might think it would be sim­ple to avoid pa­per­clip max­i­miz­ers, re­quiring no work at all for a suffi­ciently ad­vanced AI, the Orthog­o­nal­ity Th­e­sis says that things will be more difficult than that; you have to put in some work to have the AI do the right thing.

The Com­plex­ity of Value the­sis is the next step af­ter Orthog­o­nal­ity; it says that, con­tra to the feel­ing that “right­ness ought to be sim­ple, darn it”, nor­ma­tivity turns out not to have an al­gorith­mi­cally sim­ple core, not the way that cor­rectly an­swer­ing ques­tions of fact has a cen­tral ten­dency that gen­er­al­izes well. And so, even though an AI that you train to do well on prob­lems like steer­ing cars or figur­ing out Gen­eral Rel­a­tivity from scratch, may hit on a core ca­pa­bil­ity that leads the AI to do well on ar­bi­trar­ily more com­pli­cated prob­lems of galac­tic scale, we can’t rely on get­ting an equally gen­er­ous bo­nanza of gen­er­al­iza­tion from an AI that seems to do well on a small but varied set of moral and eth­i­cal prob­lems—it may still fail the next prob­lem that isn’t like any­thing in the train­ing set. To the ex­tent that we have very strong rea­sons to have prior con­fi­dence in Com­plex­ity of Value, in fact, we ought to be sus­pi­cious and wor­ried about an AI that seems to be pul­ling cor­rect moral an­swers from nowhere—it is much more likely to have hit upon the con­ver­gent in­stru­men­tal strat­egy “say what makes the pro­gram­mers trust you”, rather than hav­ing hit upon a sim­ple core of all nor­ma­tivity.

Key sub-propositions

Com­plex­ity of Value re­quires Orthog­o­nal­ity, and would be im­plied by three fur­ther sub­propo­si­tions:

The in­trin­sic com­plex­ity of value propo­si­tion is that the prop­er­ties we want AIs to achieve—what­ever stands in for the meta­syn­tac­tic vari­able ‘value’ - have a large amount of in­trin­sic in­for­ma­tion in the sense of com­pris­ing a large num­ber of in­de­pen­dent facts that aren’t be­ing gen­er­ated by a sin­gle com­pu­ta­tion­ally sim­ple rule.

A very bad ex­am­ple that may nonethe­less provide an im­por­tant in­tu­ition is to imag­ine try­ing to pin­point to an AI what con­sti­tutes ‘worth­while hap­piness’. The AI sug­gests a uni­verse tiled with tiny Q-learn­ing al­gorithms re­ceiv­ing high re­wards. After some ex­pla­na­tion and sev­eral la­beled datasets later, the AI sug­gests a hu­man brain with a wire stuck into its plea­sure cen­ter. After fur­ther ex­pla­na­tion, the AI sug­gests a hu­man in a holodeck. You be­gin talk­ing about the im­por­tance of be­liev­ing truly and that your val­ues call for ap­par­ent hu­man re­la­tion­ships to be real re­la­tion­ships rather than be­ing hal­lu­ci­nated. The AI asks you what con­sti­tutes a good hu­man re­la­tion­ship to be happy about. The se­ries of ques­tions oc­curs be­cause (ar­guendo) the AI keeps run­ning into ques­tions whose an­swers are not AI-ob­vi­ous from the pre­vi­ous an­swers already given, be­cause they in­volve new things you want such that your de­sire of them wasn’t ob­vi­ous from an­swers you’d already given. The up­shot is that the speci­fi­ca­tion of ‘worth­while hap­piness’ in­volves a long se­ries of facts that aren’t re­ducible just to the pre­vi­ous facts, and some of your prefer­ences may in­volve many fine de­tails of sur­pris­ing im­por­tance. In other words, the speci­fi­ca­tion of ‘worth­while hap­piness’ would be at least as hard to code by hand into the AI as it would be difficult to hand-code a for­mal rule that could rec­og­nize which pic­tures con­tained cats. (I.e., im­pos­si­ble.)

The sec­ond propo­si­tion is in­com­press­ibil­ity of value which says that at­tempts to re­duce these com­plex val­ues into some in­cred­ibly sim­ple and el­e­gant prin­ci­ple fail (much like early at­tempts by e.g. Ben­tham to re­duce all hu­man value to plea­sure); and that no sim­ple in­struc­tion given an AI will hap­pen to tar­get out­comes of high value ei­ther. The core rea­son to ex­pect a pri­ori that all such at­tempts will fail, is that most 1000-byte strings aren’t com­press­ible down to some in­cred­ibly sim­ple pat­tern no mat­ter how many clever tricks you try to throw at them; fewer than 1 in 1024 such strings can be com­press­ible to 990 bytes, never mind 10 bytes. Due to the tremen­dous num­ber of differ­ent pro­pos­als for why some sim­ple in­struc­tion to an AI should end up achiev­ing high-value out­comes or why all hu­man value can be re­duced to some sim­ple prin­ci­ple, there is no cen­tral demon­stra­tion that all these pro­pos­als must fail, but there is a sense in which a pri­ori we should strongly ex­pect all such clever at­tempts to fail. Many dis­agree­able at­tempts at re­duc­ing value A to value B, such as Juer­gen Sch­mid­hu­ber’s at­tempt to re­duce all hu­man value to in­creas­ing the com­pres­sion of sen­sory in­for­ma­tion, stand as a fur­ther cau­tion­ary les­son.

The third propo­si­tion is frag­ility of value which says that if you have a 1000-byte ex­act speci­fi­ca­tion of worth­while hap­piness, and you be­gin to mu­tate it, the value cre­ated by the cor­re­spond­ing AI with the mu­tated defi­ni­tion falls off rapidly. E.g. an AI with only 950 bytes of the full defi­ni­tion may end up cre­at­ing 0% of the value rather than 95% of the value. (E.g., the AI un­der­stood all as­pects of what makes for a life well-lived… ex­cept the part about re­quiring a con­scious ob­server to ex­pe­rience it.)

To­gether, these propo­si­tions would im­ply that to achieve an ad­e­quate amount of value (e.g. 90% of po­ten­tial value, or even 20% of po­ten­tial value) there may be no sim­ple hand­coded ob­ject-level goal for the AI that re­sults in that value’s re­al­iza­tion. E.g., you can’t just tell it to ‘max­i­mize hap­piness’, with some hand-coded rule for iden­ti­fy­ing hap­piness.


Com­plex­ity of Value is a cen­tral propo­si­tion in value al­ign­ment the­ory. Many fore­seen difficul­ties re­volve around it:

  • Com­plex val­ues can’t be hand-coded into an AI, and re­quire value learn­ing or Do What I Mean prefer­ence frame­works.

  • Com­plex /​frag­ile val­ues may be hard to learn even by in­duc­tion be­cause the la­beled data may not in­clude dis­tinc­tions that give all of the 1000 bytes a chance to cast an un­am­bigu­ous causal shadow into the data, and it’s very bad if 50 bytes are left am­bigu­ous.

  • Com­plex /​ frag­ile val­ues re­quire er­ror-re­cov­ery mechanisms be­cause of the worry about get­ting some sin­gle sub­tle part wrong and this be­ing catas­trophic. (And since we’re work­ing in­side of highly in­tel­li­gent agents, the re­cov­ery mechanism has to be a cor­rigible prefer­ence so that the agent ac­cepts our at­tempts at mod­ify­ing it.)

More gen­er­ally:

  • Com­plex val­ues tend to be im­pli­cated in patch-re­sis­tant prob­lems that wouldn’t be re­sis­tant if there was some ob­vi­ous 5-line speci­fi­ca­tion of ex­actly what to do, or not do.

  • Com­plex val­ues tend to be im­pli­cated in the con­text change prob­lems that wouldn’t ex­ist if we had a 5-line speci­fi­ca­tion that solved those prob­lems once and for all and that we’d likely run across dur­ing the de­vel­op­ment phase.


Many policy ques­tions strongly de­pend on Com­plex­ity of Value, mostly hav­ing to do with the over­all difficulty of de­vel­op­ing value-al­igned AI, e.g.:

  • Should we try to de­velop Sovereigns, or re­strict our­selves to Ge­nies?

  • How likely is a mod­er­ately safety-aware pro­ject to suc­ceed?

  • Should we be more wor­ried about mal­i­cious ac­tors cre­at­ing AI, or about well-in­ten­tioned er­rors?

  • How difficult is the to­tal prob­lem and how much should we be pan­ick­ing?

  • How at­trac­tive would be any gen­uinely cred­ible game-chang­ing al­ter­na­tive to AI?

It has been ad­vo­cated that there are psy­cholog­i­cal bi­ases and pop­u­lar mis­takes lead­ing to be­liefs that di­rectly or by im­pli­ca­tion deny Com­plex Value. To the ex­tent one cred­its that Com­plex Value is prob­a­bly true, one should ar­guably be con­cerned about the num­ber of early as­sess­ments of the value al­ign­ment prob­lem that seem to rely on Com­plex Value be­ing false (like just need­ing to hard­code a par­tic­u­lar goal into the AI, or in gen­eral treat­ing the value al­ign­ment prob­lem as not panic-worthily difficult).

Truth condition

The Com­plex­ity of Value propo­si­tion is true if, rel­a­tive to vi­able and ac­cept­able real-world method­olo­gies for AI de­vel­op­ment, there isn’t any re­li­ably know­able way to spec­ify the AI’s ob­ject-level prefer­ences as a struc­ture of low al­gorith­mic com­plex­ity, such that the re­sult of run­ning that AI is achiev­ing enough of the pos­si­ble value, for rea­son­able defi­ni­tions of value.


Vi­able and ac­cept­able computation

Sup­pose there turns out to ex­ist, in prin­ci­ple, a rel­a­tively sim­ple Tur­ing ma­chine (e.g. 100 states) that picks out ‘value’ by re-run­ning en­tire evolu­tion­ary his­to­ries, cre­at­ing and dis­card­ing a hun­dred billion sapi­ent races in or­der to pick out one that ended up rel­a­tively similar to hu­man­ity. This would use an un­re­al­is­ti­cally large amount of com­put­ing power and also com­mit an un­ac­cept­able amount of mind­crime.



  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.