Relevant powerful agent

A cog­ni­tively pow­er­ful agent is rele­vant if it is cog­ni­tively pow­er­ful enough to be a game-changer in the larger dilemma faced by Earth-origi­nat­ing in­tel­li­gent life. Con­versely, an agent is ir­rele­vant if it is too weak to make much of a differ­ence, or if the cog­ni­tive prob­lems it can solve or tasks it is au­tho­rized to perform don’t sig­nifi­cantly change the over­all situ­a­tion we face.


In­tu­itively speak­ing, a value-al­igned AI is ‘rele­vant’ to the ex­tent it has a ‘yes’ an­swer to the box ‘Can we use this AI to pro­duce a benefit that solves the larger dilemma?’ in this flowchart, or is part of a larger plan that gets us to the lower green cir­cle with­out any “Then a mir­a­cle oc­curs” steps. A cog­ni­tively pow­er­ful agent is rele­vant at all if its ex­is­tence can effec­tu­ate good or bad out­comes—e.g., a big neu­ral net is not ‘rele­vant’ be­cause it doesn’t end the world one way or an­other.

(A bet­ter word than ‘rele­vant’ might be helpful here.)


Pos­i­tive examples

  • A hy­po­thet­i­cal agent that can boot­strap to nan­otech­nol­ogy by solv­ing the in­verse pro­tein fold­ing prob­lem and shut down other AI pro­jects, in a way that can rea­son­ably be known safe enough to au­tho­rize by the AI’s pro­gram­mers, would be rele­vant.

Nega­tive examples

  • An agent au­tho­rized to prove or dis­prove the Rie­mann Hy­poth­e­sis, but not to do any­thing else, is not rele­vant (un­less know­ing whether the Rie­mann Hy­poth­e­sis is true some­how changes ev­ery­thing for the ba­sic dilemma of AI).

  • An or­a­cle that can only out­put ver­ified HOL proofs is not yet ‘rele­vant’ un­til some­one can de­scribe the­o­rems to prove such that firm knowl­edge of their truth would be a game-changer for the AI situ­a­tion. (Hy­poth­e­siz­ing that some­one else will come up with a the­o­rem like that, if you just build the or­a­cle, is a hail Mary step in the plan.)


Many pro­pos­als for AI safety, es­pe­cially ad­vanced safety, so severely re­strict the ap­pli­ca­bil­ity of the AI that the AI is no longer al­lowed to do any­thing that seems like it could solve the larger dilemma. (E.g., an or­a­cle that is only al­lowed to give us bi­nary an­swers for whether it thinks cer­tain math­e­mat­i­cal facts are true, and no­body has yet said how to use this abil­ity to save the world.)

Con­versely, pro­pos­als to use AIs to do things im­pact­ful enough to solve the larger dilemma, gen­er­ally run smack into all the usual ad­vanced safety prob­lems, es­pe­cially if the AI must op­er­ate in the rich do­main of the real world to carry out the task (this tends to re­quire full trust).

Open problem

It is an open prob­lem to pro­pose a rele­vant, limited AI that would be sig­nifi­cantly eas­ier to han­dle than the gen­eral safety prob­lem, while also be­ing use­ful enough to re­solve the larger .


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.