Autonomous AGI

An au­tonomous or self-di­rected ad­vanced agent, a ma­chine in­tel­li­gence which acts in the real world in pur­suit of its prefer­ences with­out fur­ther user in­ter­ven­tion or steer­ing. In Bostrom’s ty­pol­ogy of ad­vanced agents, this is a “Sovereign” and dis­t­in­guished from a “Ge­nie” or an “Or­a­cle”. (“Sovereign” in this sense means self-sovereign, and is not to be con­fused with the con­cept of a Bostro­mian sin­gle­ton or any par­tic­u­lar kind of so­cial gov­er­nance.)

Usu­ally, when we say “Sovereign” or “self-di­rected”, we’ll be talk­ing about a sup­pos­edly al­igned AI that acts au­tonomously by de­sign. Failure to solve the al­ign­ment prob­lem prob­a­bly means the re­sult­ing AI is self-di­rected-by-de­fault.

Try­ing to con­struct an au­tonomous Friendly AI sug­gests that we trust the AI more than the pro­gram­mers in any con­flict be­tween them, and we’re okay with re­mov­ing all con­straints and off-switches ex­cept those the agent vol­un­tar­ily takes upon it­self.

A suc­cess­fully al­igned au­tonomous AGI would carry the least moral haz­ard of any sce­nario, since it hands off steer­ing to some fixed prefer­ence frame­work or ob­jec­tive that the pro­gram­mers can no longer mod­ify. Nonethe­less, be­ing re­ally re­ally re­ally that sure, not just get­ting it right but know­ing we’ve got­ten it right, seems like a large enough prob­lem that per­haps we shouldn’t be try­ing to build this class of AI for our first try, and should first tar­get a Task AGI in­stead, or some­thing else in­volv­ing on­go­ing user steer­ing.

An au­tonomous su­per­in­tel­li­gence would be the most difficult pos­si­ble class of AGI to al­ign, re­quiring to­tal al­ign­ment. Co­her­ent ex­trap­o­lated vo­li­tion is a pro­posed al­ign­ment tar­get for an au­tonomous su­per­in­tel­li­gence, but again, prob­a­bly not some­thing we should at­tempt to do on our first try.


  • Strategic AGI typology

    What broad types of ad­vanced AIs, cor­re­spond­ing to which strate­gic sce­nar­ios, might it be pos­si­ble or wise to cre­ate?