Autonomous AGI

An autonomous or self-directed advanced agent, a machine intelligence which acts in the real world in pursuit of its preferences without further user intervention or steering. In Bostrom’s typology of advanced agents, this is a “Sovereign” and distinguished from a “Genie” or an “Oracle”. (“Sovereign” in this sense means self-sovereign, and is not to be confused with the concept of a Bostromian singleton or any particular kind of social governance.)

Usually, when we say “Sovereign” or “self-directed”, we’ll be talking about a supposedly aligned AI that acts autonomously by design. Failure to solve the alignment problem probably means the resulting AI is self-directed-by-default.

Trying to construct an autonomous Friendly AI suggests that we trust the AI more than the programmers in any conflict between them, and we’re okay with removing all constraints and off-switches except those the agent voluntarily takes upon itself.

A successfully aligned autonomous AGI would carry the least moral hazard of any scenario, since it hands off steering to some fixed preference framework or objective that the programmers can no longer modify. Nonetheless, being really really really that sure, not just getting it right but knowing we’ve gotten it right, seems like a large enough problem that perhaps we shouldn’t be trying to build this class of AI for our first try, and should first target a Task AGI instead, or something else involving ongoing user steering.

An autonomous superintelligence would be the most difficult possible class of AGI to align, requiring total alignment. Coherent extrapolated volition is a proposed alignment target for an autonomous superintelligence, but again, probably not something we should attempt to do on our first try.

Parents:

  • Strategic AGI typology

    What broad types of advanced AIs, corresponding to which strategic scenarios, might it be possible or wise to create?