Value achievement dilemma

The value achieve­ment dilemma is a way of fram­ing the AI al­ign­ment prob­lem in a larger con­text. This em­pha­sizes that there might be pos­si­ble solu­tions be­sides AI; and also em­pha­sizes that such solu­tions must meet a high bar of po­tency or effi­cacy in or­der to re­solve our ba­sic dilem­mas, the way that a suffi­ciently value-al­igned and cog­ni­tively pow­er­ful AI could re­solve our ba­sic dilem­mas. Or at least change the na­ture of the game­board, the way that a Task AGI could take ac­tions to pre­vent de­struc­tion by later AGI pro­jects, even if is only nar­rowly value-al­igned and can­not solve the whole prob­lem.

The point of con­sid­er­ing posthu­man sce­nar­ios in the long run, and not just an im­me­di­ate Task AGI as band-aid, can be seen in the sug­ges­tion by Eliezer Yud­kowsky find a cita­tion—CFAI? PtS? and Nick Bostrom cite Su­per­in­tel­li­gence that we can see Earth-origi­nat­ing in­tel­li­gent life as hav­ing two pos­si­ble sta­ble states, su­per­in­tel­li­gence and ex­tinc­tion. If in­tel­li­gent life goes ex­tinct, es­pe­cially if it dras­ti­cally dam­ages or de­stroys the eco­sphere in the pro­cess, new in­tel­li­gent life seems un­likely to arise on Earth. If Earth-origi­nat­ing in­tel­li­gent life be­comes su­per­in­tel­li­gent, it will pre­sum­ably ex­pand through the uni­verse and stay su­per­in­tel­li­gent for as long as phys­i­cally pos­si­ble. Even­tu­ally, our civ­i­liza­tion is bound to wan­der into one of these at­trac­tors or an­other.

Fur­ther­more, by the generic prefer­ence sta­bil­ity ar­gu­ment, any suffi­ciently ad­vanced cog­ni­tive agent is very likely to be sta­ble in its mo­ti­va­tions or meta-prefer­ence frame­work. So if and when life wan­ders into the su­per­in­tel­li­gence at­trac­tor, it will ei­ther end up in a sta­ble state of e.g. fun-lov­ing or the re­flec­tive equil­ibrium of its cre­ators’ civ­i­liza­tion and hence achiev­ing lots of value, or a mis­al­igned AI will go on max­i­miz­ing pa­per­clips for­ever.

Among the dilem­mas we face in get­ting into the high-value-achiev­ing at­trac­tor, rather than the ex­tinc­tion at­trac­tor or the equiv­alence class of pa­per­clip max­i­miz­ers, are:

  • The pos­si­bil­ity of care­less (or in­suffi­ciently cau­tious, or much less likely mal­i­cious) ac­tors cre­at­ing a non-value-al­igned AI that un­der­goes an in­tel­li­gence ex­plo­sion.

  • The pos­si­bil­ity of en­g­ineered su­per­viruses de­stroy­ing enough of civ­i­liza­tion that the re­main­ing hu­mans go ex­tinct with­out ever reach­ing suffi­ciently ad­vanced tech­nol­ogy.

  • Con­flict be­tween mul­ti­po­lar pow­ers with nan­otech­nol­ogy re­sult­ing in a su­per-nu­clear-ex­change dis­aster that ex­tin­guishes all life.

Other pos­i­tive events seem like they could po­ten­tially prompt en­try into the high-value-achiev­ing su­per­in­tel­li­gence at­trac­tor:

  • Direct cre­ation of a fully nor­ma­tively al­igned Au­tonomous AGI agent.

  • Creation of a Task AGI pow­er­ful enough to avert the cre­ation of other UnFriendly AI.

  • In­tel­li­gence-aug­mented hu­mans (or 64-node clus­tered hu­mans linked by brain-com­puter in­ter­face brain in­for­ma­tion ex­change, etcetera) who are able and mo­ti­vated to solve the AI al­ign­ment prob­lem.

On the other hand, con­sider some­one who pro­poses that “Rather than build­ing AI, we should build Or­a­cle AIs that just an­swer ques­tions,” and who then, af­ter fur­ther ex­po­sure to the con­cept of the AI-Box Ex­per­i­ment and cog­ni­tive un­con­tain­abil­ity, fur­ther nar­rows their speci­fi­ca­tion to say that an Or­a­cle run­ning in three lay­ers of sand­boxed simu­la­tion must out­put only for­mal proofs of given the­o­rems in Zer­melo-Fraenkel set the­ory, and a heav­ily sand­boxed and prov­ably cor­rect ver­ifier will look over this out­put proof and sig­nal 1 if it proves the tar­get the­o­rem and 0 oth­er­wise, at some fixed time to avoid timing at­tacks.

This doesn’t re­solve the larger value achieve­ment dilemma, be­cause there’s no ob­vi­ous thing we can do with a ZF prov­abil­ity or­a­cle that solves our larger prob­lem. There’s no plan such that it would save the world if only we could take some sus­pected the­o­rems of ZF and know that some of them had for­mal proofs.

The thrust of con­sid­er­ing a larger ‘value achieve­ment dilemma’ is that while imag­in­able al­ter­na­tives to al­igned AIs ex­ist, they must pass a dou­ble test to be our best al­ter­na­tive:

  • They must be gen­uinely eas­ier or safer than the eas­iest (pivotal) form of the AI al­ign­ment prob­lem.

  • They must be game-chang­ers for the over­all situ­a­tion in which we find our­selves, open­ing up a clear path to vic­tory from the newly achieved sce­nario.

Any strat­egy that does not pu­ta­tively open a clear path to vic­tory if it suc­ceeds, doesn’t seem like a plau­si­ble policy al­ter­na­tive to try­ing to solve the AI al­ign­ment prob­lem or to do­ing some­thing else such that suc­cess leaves us a clear path to vic­tory. Try­ing to solve the AI al­ign­ment prob­lem is some­thing in­tended to leave us a clear path to achiev­ing al­most all of the achiev­able value for the fu­ture and its as­tro­nom­i­cal stakes. Any­thing that doesn’t open a clear path to get­ting there is not an al­ter­na­tive solu­tion for get­ting there.

For more on this point, see the page on pivotal events.

Subprob­lems of the larger value achieve­ment dilemma

We can see the place of AI al­ign­ment in the larger scheme by con­sid­er­ing its par­ent prob­lem, its sibling prob­lems, and ex­am­ples of its child prob­lems.

  • The value achieve­ment dilemma: How does Earth-origi­nat­ing in­tel­li­gent life achieve an ac­cept­able pro­por­tion of its po­ten­tial value?

  • The AI al­ign­ment prob­lem: How do we cre­ate AIs such that run­ning them pro­duces (global) out­comes of ac­cept­ably high value?

    • The value al­ign­ment prob­lem: How do we cre­ate AIs that want or pre­fer to cause events that are of high value? If we ac­cept that we should solve the value al­ign­ment prob­lem by cre­at­ing AIs that pre­fer or want in par­tic­u­lar ways, how do we do that?

    • Other prop­er­ties of al­igned AIs such as e.g. cor­rigi­bil­ity: How can we cre­ate AIs such that, when we make an er­ror in iden­ti­fy­ing value or spec­i­fy­ing the de­ci­sion sys­tem, the AI does not re­sist our at­tempts to cor­rect what we re­gard as an er­ror?

    • Op­po­si­tional fea­tures such as e.g. box­ing that are in­tended to miti­gate harm if the AI’s be­hav­ior has gone out­side ex­pected bounds.

  • The in­tel­li­gence am­plifi­ca­tion prob­lem. How can we cre­ate smarter hu­mans, prefer­ably with­out driv­ing them in­sane or oth­er­wise end­ing up with evil ones?

  • The value se­lec­tion prob­lem. How can we figure out what to sub­sti­tute in for the meta­syn­tac­tic vari­able ‘value’? (An­swer.)


  • Moral hazards in AGI development

    “Mo­ral haz­ard” is when own­ers of an ad­vanced AGI give in to the temp­ta­tion to do things with it that the rest of us would re­gard as ‘bad’, like, say, declar­ing them­selves God-Em­peror.

  • Coordinative AI development hypothetical

    What would safe AI de­vel­op­ment look like if we didn’t have to worry about any­thing else?

  • Pivotal event

    Which types of AIs, if they work, can do things that dras­ti­cally change the na­ture of the fur­ther game?

  • Cosmic endowment

    The ‘cos­mic en­dow­ment’ con­sists of all the stars that could be reached from probes origi­nat­ing on Earth; the sum of all mat­ter and en­ergy po­ten­tially available to be trans­formed into life and fun.

  • Aligning an AGI adds significant development time

    Align­ing an ad­vanced AI fore­see­ably in­volves ex­tra code and ex­tra test­ing and not be­ing able to do ev­ery­thing the fastest way, so it takes longer.


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.