Advanced nonagent

A stan­dard agent:

  1. Ob­serves reality

  2. Uses its ob­ser­va­tions to build a model of reality

  3. Uses its model to fore­cast the effects of pos­si­ble ac­tions or policies

  4. Chooses among poli­cies on the ba­sis of its util­ity func­tion over the con­se­quences of those policies

  5. Car­ries out the cho­sen policy

(…and then ob­serves the ac­tual re­sults of its ac­tions, and up­dates its model, and con­sid­ers new poli­cies, etcetera.)

It’s con­ceiv­able that a cog­ni­tively pow­er­ful pro­gram could carry out some, but not all, of these ac­tivi­ties. We could call this an “ad­vanced pseu­doa­gent” or “ad­vanced nona­gent”.

Ex­am­ple: Plan­ning Oracle

Imag­ine that we have an Or­a­cle agent which out­puts a plan \(\pi_0\) which is meant to max­i­mize lives saved or eu­daimo­nia etc., as­sum­ing that the hu­man op­er­a­tors de­cide to carry out the plan. By hy­poth­e­sis, the agent does not as­sess the prob­a­bil­ity that the plan will be car­ried out, or try to max­i­mize the prob­a­bil­ity that the plan will be car­ried out.

We could look at this as mod­ify­ing step 4 of the loop: rather than this pseu­doa­gent se­lect­ing the out­put whose ex­pected con­se­quences op­ti­mize its util­ity func­tion, it se­lects the out­put that op­ti­mizes util­ity as­sum­ing some other event oc­curs (the hu­mans de­cid­ing to carry out the plan).

We could also look at the whole Or­a­cle schema as in­ter­rupt­ing step 5 of the loop. If the Or­a­cle works as in­tended, its pur­pose is not to im­me­di­ately out­put op­ti­mized ac­tions into the world; rather it is meant to out­put plans for hu­mans to carry out. This though is more of a metaphor­i­cal or big-pic­ture prop­erty. If not for the mod­ifi­ca­tion of step four where the Or­a­cle calcu­lates \(\mathbb E [U | \operatorname{do}(\pi_0), HumansObeyPlan]\) in­stead of \(\mathbb E [U | \operatorname{do}(\pi_0)],\) the Or­a­cle’s out­putted plans would just be its ac­tions within the agent schema above. (And it would op­ti­mize the gen­eral effects of its plan-out­putting ac­tions, in­clud­ing the prob­lem of get­ting the hu­mans to carry out the plans.)

Ex­am­ple: Imi­ta­tion-based agents

Imi­ta­tion-based agents would mod­ify steps 3 and 4 of the loop by “try­ing to out­put an ac­tion in­dis­t­in­guish­able from the out­put of the hu­man imi­tated” rather than fore­cast­ing con­se­quences or op­ti­miz­ing over con­se­quences, ex­cept per­haps in­so­far as fore­cast­ing con­se­quences is im­por­tant for guess­ing what the hu­man would do, or they’re in­ter­nally imi­tat­ing a hu­man mode of thought that in­volves men­tally imag­in­ing the con­se­quences and choos­ing be­tween them. “Imi­ta­tion-based agents” might justly be called pseu­doa­gents, in this schema.

(But the “pseu­doa­gent” ter­minol­ogy is rel­a­tively new, and a bit awk­ward, and it won’t be sur­pris­ing if we all go on say­ing “imi­ta­tion-based agents” or “act-based agents”. The point of hav­ing terms like ‘pseu­doa­gent’ or ‘ad­vanced nona­gent’ is to have a name for the gen­eral con­cept, not to re­serve and guard the word ‘agent’ for only 100% real pure agents.)

Safety benefits and difficulties

Ad­vanced pseu­doa­gents and nona­gents are usu­ally pro­posed in the hope of avert­ing some ad­vanced safety is­sue that seems to arise from the agenty part of “ad­vanced agency”, while pre­serv­ing other ad­vanced cog­ni­tive pow­ers that seem use­ful.

A pro­posal like this can fail to the ex­tent that it’s not prag­mat­i­cally pos­si­ble to un­en­tan­gle one as­pect of agency from an­other; or to the ex­tent that re­mov­ing that much agency would make the AI safe but use­less.

Some hy­po­thet­i­cal ex­am­ples that would, if they hap­pened, con­sti­tute cases of failed safety or un­work­able trade­offs in pseu­doa­gent com­pro­mises:

• Some­body pro­poses to ob­tain an Or­a­cle merely in virtue of only giv­ing the AI a text out­put chan­nel, and only tak­ing what it says as sug­ges­tions, thereby in­ter­rupt­ing the loop be­tween the agent’s poli­cies and it act­ing in the world. If this is all that changes, then from the Or­a­cle’s per­spec­tive it’s still an agent, its text out­put is its mo­tor chan­nel, and it still im­me­di­ately out­puts what­ever act it ex­pects to max­i­mize sub­jec­tive ex­pected util­ity, treat­ing the hu­mans as part of the en­vi­ron­ment to be op­ti­mized. It’s an agent that some­body is try­ing to use as part of a larger pro­cess with an in­ter­rupted agent loop, but the AI de­sign it­self is a pure agent.

• Some­body ad­vo­cates for de­sign­ing an AI that only com­putes and out­puts prob­a­bil­ity es­ti­mates; and never searches for any EU-max­i­miz­ing poli­cies, let alone out­puts them. It turns out that this AI can­not well-man­age its in­ter­nal and re­flec­tive op­er­a­tions, be­cause it can’t use con­se­quen­tial­ism to se­lect the best thought to think next. As a re­sult, the AI de­sign fails to boot­strap, or fails to work suffi­ciently well be­fore com­pet­ing AI de­signs that use in­ter­nal con­se­quen­tial­ism. (Safe but use­less, much like a rock.)

• Some­body ad­vo­cates that an imi­ta­tive agent de­sign will avoid in­vok­ing the ad­vanced safety is­sues that seem like they should be as­so­ci­ated with con­se­quen­tial­ist rea­son­ing, be­cause the imi­ta­tion-based pseu­doa­gent never does any con­se­quen­tial­ist rea­son­ing or plan­ning; it only tries to pro­duce an out­put ex­tremely similar to its train­ing set of ob­served hu­man out­puts. But it turns out (ar­guendo) that the pseu­doa­gent, to imi­tate the hu­man, has to imi­tate con­se­quen­tial­ist rea­son­ing, and so the im­plied dan­gers end up pretty much the same.

• An agent is sup­posed to just be an ex­tremely pow­er­ful policy-re­in­force­ment learner in­stead of an ex­pected util­ity op­ti­mizer. After a huge amount of op­ti­miza­tion and mu­ta­tion on a very gen­eral rep­re­sen­ta­tion for poli­cies, it turns out that the best poli­cies, the ones that were the most re­in­forced by the high­est re­wards, are com­put­ing con­se­quen­tial­ist mod­els in­ter­nally. The ac­tual re­sult ends up be­ing that the AI is do­ing con­se­quen­tial­ist rea­son­ing that is ob­scured and hid­den, since it takes place out­side the de­signed and eas­ily visi­ble high-level-loop of the AI.

Com­ing up with a pro­posal for an ad­vanced pseu­doa­gent, that still did some­thing pivotal and was ac­tu­ally safer, would rea­son­ably re­quire: (a) un­der­stand­ing how to slice up agent prop­er­ties along their nat­u­ral joints; (b) un­der­stand­ing which ad­vanced-agency prop­er­ties lead to which ex­pected safety prob­lems and how; and (c) un­der­stand­ing which in­ter­nal cog­ni­tive func­tions would be needed to carry out some par­tic­u­lar pivotal task; adding up to (d) see an ex­ploitable pry­ing-apart of the ad­vanced-AI joints.

What’s of­ten pro­posed in prac­tice is more along the lines of:

  • “We just need to build AIs with­out emo­tions so they won’t have drives that make them com­pete with us.” (Can you trans­late that into the lan­guage of util­ity func­tions and con­se­quen­tial­ist plan­ning, please?)

  • “Let’s just build an AI that an­swers hu­man ques­tions.” (It’s do­ing a lot more than that in­ter­nally, so how are the in­ter­nal op­er­a­tions or­ga­nized? Also, what do you do with a ques­tion-an­swer­ing AI that averts the con­se­quences of some­body else build­ing a more agenty AI?)

Com­ing up with a sen­si­ble pro­posal for a pseu­doa­gent is hard. The rea­son for talk­ing about “agents” in talk­ing about fu­ture AIs isn’t be­cause the speaker wants to give AIs lots of power and have them wan­der­ing the world do­ing what­ever they like un­der their own drives (for this en­tirely sep­a­rate con­cept see au­tonomous AGI). The rea­son we talk about ob­serve-model-pre­dict-act ex­pected-util­ity con­se­quen­tial­ists, is that this seems to carve a lot of im­por­tant con­cepts at their joints. Some al­ter­na­tive pro­pos­als ex­ist, but they of­ten have a feel of “carv­ing against the joints” or try­ing to push through an un­nat­u­ral ar­range­ment, and aren’t as nat­u­ral or as sim­ple to de­scribe.


  • Advanced agent properties

    How smart does a ma­chine in­tel­li­gence need to be, for its nice­ness to be­come an is­sue? “Ad­vanced” is a broad term to cover cog­ni­tive abil­ities such that we’d need to start con­sid­er­ing AI al­ign­ment.