Task-directed AGI

A task-based AGI is an AGI in­tended to fol­low a se­ries of hu­man-origi­nated or­ders, with these or­ders each be­ing of limited scope—“satis­fic­ing” in the sense that they can be ac­com­plished us­ing bounded amounts of effort and re­sources (as op­posed to the goals be­ing more and more fulfillable us­ing more and more effort).

In Bostrom’s ty­pol­ogy, this is termed a “Ge­nie”. It con­trasts with a “Sovereign” AGI that acts au­tonomously in the pur­suit of long-term real-world goals.

Build­ing a safe Task AGI might be eas­ier than build­ing a safe Sovereign for the fol­low­ing rea­sons:

  • A Task AGI can be “on­line”; the AGI can po­ten­tially query the user be­fore and dur­ing Task perfor­mance. (As­sum­ing an am­bigu­ous situ­a­tion arises, and is suc­cess­fully iden­ti­fied as am­bigu­ous.)

  • A Task AGI can po­ten­tially be limited in var­i­ous ways, since a Task AGI doesn’t need to be as pow­er­ful as pos­si­ble in or­der to ac­com­plish its limited-scope Tasks. A Sovereign would pre­sum­ably en­gage in all-out self-im­prove­ment. (This isn’t to say Task AGIs would au­to­mat­i­cally not self-im­prove, only that it’s pos­si­ble in prin­ci­ple to limit the power of a Task AGI to only the level re­quired to do the tar­geted Tasks, if the as­so­ci­ated safety prob­lems can be solved.)

  • Tasks, by as­sump­tion, are limited in scope—they can be ac­com­plished and done, in­side some limited re­gion of space and time, us­ing some limited amount of effort which is then com­plete. (To gain this ad­van­tage, a state of Task ac­com­plish­ment should not go higher and higher in prefer­ence as more and more effort is ex­pended on it open-end­edly.)

  • As­sum­ing that users can figure out in­tended goals for the AGI that are valuable and pivotal, the iden­ti­fi­ca­tion prob­lem for de­scribing what con­sti­tutes a safe perfor­mance of that Task, might be sim­pler than giv­ing the AGI a com­plete de­scrip­tion of nor­ma­tivity in gen­eral. That is, the prob­lem of com­mu­ni­cat­ing to an AGI an ad­e­quate de­scrip­tion of “cure can­cer” (with­out kil­ling pa­tients or caus­ing other side effects), while still difficult, might be sim­pler than an ad­e­quate de­scrip­tion of all nor­ma­tive value. Task AGIs fall on the nar­row side of Am­bi­tious vs. nar­row value learn­ing.

Rel­a­tive to the prob­lem of build­ing a Sovereign, try­ing to build a Task AGI in­stead might step down the prob­lem from “im­pos­si­bly difficult” to “in­sanely difficult”, while still main­tain­ing enough power in the AI to perform pivotal acts.

The ob­vi­ous dis­ad­van­tage of a Task AGI is moral haz­ard—it may tempt the users in ways that a Sovereign would not. A Sovereign has moral haz­ard chiefly dur­ing the de­vel­op­ment phase, when the pro­gram­mers and users are per­haps not yet in a po­si­tion of spe­cial rel­a­tive power. A Task AGI has on­go­ing moral haz­ard as it is used.

Eliezer Yud­kowsky has sug­gested that peo­ple only con­front many im­por­tant prob­lems in value al­ign­ment when they are think­ing about Sovereigns, but that at the same time, Sovereigns may be im­pos­si­bly hard in prac­tice. Yud­kowsky ad­vo­cates that peo­ple think about Sovereigns first and list out all the as­so­ci­ated is­sues be­fore step­ping down their think­ing to Task AGIs, be­cause think­ing about Task AGIs may re­sult in pre­ma­ture prun­ing, while think­ing about Sovereigns is more likely to gen­er­ate a com­plete list of prob­lems that can then be checked against par­tic­u­lar Task AGI ap­proaches to see if those prob­lems have be­come any eas­ier.

Three dis­t­in­guished sub­types of Task AGI are these:

  • Or­a­cles, an AI that is in­tended to only an­swer ques­tions, pos­si­bly from some re­stricted ques­tion set.

  • Known-al­gorithm AIs, which are not self-mod­ify­ing or very weakly self-mod­ify­ing, such that their al­gorithms and rep­re­sen­ta­tions are mostly known and mostly sta­ble.

  • Be­hav­iorist Ge­nies, which are meant to not model hu­man minds or model them in only very limited ways, while hav­ing great ma­te­rial un­der­stand­ing (e.g., po­ten­tially the abil­ity to in­vent and de­ploy nan­otech­nol­ogy).


The prob­lem of mak­ing a safe ge­nie in­vokes nu­mer­ous subtopics such as low im­pact, mild op­ti­miza­tion, and con­ser­vatism as well as nu­mer­ous stan­dard AGI safety prob­lems like re­flec­tive sta­bil­ity and safe iden­ti­fi­ca­tion of in­tended goals.

(See here for a sep­a­rate page on open prob­lems in Task AGI safety that might be ready for cur­rent re­search.)

Some fur­ther prob­lems be­yond those ap­pear­ing in the page above are:


  • Behaviorist genie

    An ad­vanced agent that’s for­bid­den to model minds in too much de­tail.

  • Epistemic exclusion

    How would you build an AI that, no mat­ter what else it learned about the world, never knew or wanted to know what was in­side your base­ment?

  • Open subproblems in aligning a Task-based AGI

    Open re­search prob­lems, es­pe­cially ones we can model to­day, in build­ing an AGI that can “paint all cars pink” with­out turn­ing its fu­ture light cone into pink-painted cars.

  • Low impact

    The open prob­lem of hav­ing an AI carry out tasks in ways that cause min­i­mum side effects and change as lit­tle of the rest of the uni­verse as pos­si­ble.

  • Conservative concept boundary

    Given N ex­am­ple bur­ri­tos, draw a bound­ary around what is a ‘bur­rito’ that is rel­a­tively sim­ple and al­lows as few pos­i­tive in­stances as pos­si­ble. Helps make sure the next thing gen­er­ated is a bur­rito.

  • Querying the AGI user

    Pos­tu­lat­ing that an ad­vanced agent will check some­thing with its user, prob­a­bly comes with some stan­dard is­sues and gotchas (e.g., pri­ori­tiz­ing what to query, not ma­nipu­lat­ing the user, etc etc).

  • Mild optimization

    An AGI which, if you ask it to paint one car pink, just paints one car pink and doesn’t tile the uni­verse with pink-painted cars, be­cause it’s not try­ing that hard to max out its car-paint­ing score.

  • Task identification problem

    If you have a task-based AGI (Ge­nie) then how do you pin­point ex­actly what you want it to do (and not do)?

  • Safe plan identification and verification

    On a par­tic­u­lar task or prob­lem, the is­sue of how to com­mu­ni­cate to the AGI what you want it to do and all the things you don’t want it to do.

  • Faithful simulation

    How would you iden­tify, to a Task AGI (aka Ge­nie), the prob­lem of scan­ning a hu­man brain, and then run­ning a suffi­ciently ac­cu­rate simu­la­tion of it for the simu­la­tion to not be crazy or psy­chotic?

  • Task (AI goal)

    When build­ing the first AGIs, it may be wiser to as­sign them only goals that are bounded in space and time, and can be satis­fied by bounded efforts.

  • Limited AGI

    Task-based AGIs don’t need un­limited cog­ni­tive and ma­te­rial pow­ers to carry out their Tasks; which means their pow­ers can po­ten­tially be limited.

  • Oracle

    Sys­tem de­signed to safely an­swer ques­tions.

  • Boxed AI

    Idea: what if we limit how AI can in­ter­act with the world. That’ll make it safe, right??


  • Strategic AGI typology

    What broad types of ad­vanced AIs, cor­re­spond­ing to which strate­gic sce­nar­ios, might it be pos­si­ble or wise to cre­ate?

  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.