Task (AI goal)

A “Task” is a goal or sub­goal within an ad­vanced AI, that can be satis­fied as fully as pos­si­ble by op­ti­miz­ing a bounded part of space, for a limited time, with a limited amount of effort.

E.g., “make as many pa­per­clips as pos­si­ble” is definitely not a ‘task’ in this sense, since it spans ev­ery pa­per­clip any­where in space and fu­ture time. Creat­ing more and more pa­per­clips, us­ing more and more effort, would be more and more prefer­able up to the max­i­mum ex­ertable effort.

For a more sub­tle ex­am­ple of non-task­ish­ness, con­sider Dis­ney’s “sor­cerer’s ap­pren­tice” sce­nario: Mickey Mouse com­mands a broom­stick to fill a cauldron. The broom­stick then adds more and more wa­ter to the cauldron un­til the work­shop is flooded. (Mickey then tries to de­stroy the broom­stick. But since the broom­stick has no de­signed-in re­flec­tively sta­ble shut­down but­ton, the broom­stick re­pairs it­self and be­gins con­struct­ing sub­agents that go on pour­ing more wa­ter into the cauldron.)

Since the Dis­ney car­toon is a mu­si­cal, we don’t know if the broom­stick was given a time bound on its job. Let us sup­pose that Mickey tells the broom­stick to do its job some­time be­fore 1pm.

Then we might imag­ine that the broom­stick is a sub­jec­tive ex­pected util­ity max­i­mizer with a util­ity func­tion \(U_{cauldron}\) over out­comes \(o\):

$$U_{cauldron}(o): \begin{cases} 1 & \text{if in $o$ the cauldron is $\geq 90\%$ full of water at 1pm} \\ 0 & \text{otherwise} \end{cases}$$

This looks at first glance like it ought to be task­ish:

  • The cauldron is bounded in space.

  • The goal only con­cerns events that hap­pen be­fore a cer­tain time.

  • The high­est util­ity that can be achieved is \(1,\) which is reached as soon as the cauldron is \(\geq 90\%\) full of wa­ter, which seems achiev­able us­ing a limited amount of effort.

The last prop­erty in par­tic­u­lar makes \(U_{cauldron}\) a “satis­fic­ing util­ity func­tion”, one where an out­come is ei­ther satis­fac­tory or not-satis­fac­tory, and it is not pos­si­ble to do any bet­ter than “satis­fac­tory”.

But by pre­vi­ous as­sump­tion, the broom­stick is still op­ti­miz­ing ex­pected util­ity. As­sume the broom­stick rea­sons with rea­son­able gen­er­al­ity via some uni­ver­sal prior. Then the sub­jec­tive prob­a­bil­ity of the cauldron be­ing full, when it looks full to the broom­stick-agent, will not be ex­actly \(1.\) Per­haps (the broom­stick-agent rea­sons) the broom­stick’s cam­eras are malfunc­tion­ing, or its RAM has malfunc­tioned pro­duc­ing an in­ac­cu­rate mem­ory.

Then the broom­stick-agent rea­sons that it can fur­ther in­crease the prob­a­bil­ity of the cauldron be­ing full—how­ever slight the in­crease in prob­a­bil­ity—by go­ing ahead and dump­ing in an­other bucket of wa­ter.

That is: Cromwell’s Rule im­plies that the sub­jec­tive prob­a­bil­ity of the bucket be­ing full never reaches ex­actly \(1\). Then there can be an in­finite se­ries of in­creas­ingly preferred, in­creas­ingly more effort­ful poli­cies \(\pi_1, \pi_2, \pi_3 \ldots\) with

$$\mathbb E [ U_{cauldron} | \pi_1] = 0.99\\ \mathbb E [ U_{cauldron} | \pi_2] = 0.999 \\ \mathbb E [ U_{cauldron} | \pi_3] = 0.999002 \\ \ldots$$

In that case the broom­stick can always do bet­ter in ex­pected util­ity (how­ever slightly) by ex­ert­ing even more effort, up to the max­i­mum effort it can ex­ert. Hence the flooded work­shop.

If on the other hand the broom­stick is an ex­pected util­ity satis­ficer, i.e., a policy is “ac­cept­able” if it has \(\mathbb E [ U_{cauldron} | \pi ] \geq 0.95,\) then this is now fi­nally a task­ish pro­cess (we think). The broom­stick can find some policy that’s rea­son­ably sure of filling up the cauldron, ex­e­cute that policy, and then do no more.

As de­scribed, this broom­stick doesn’t yet have any im­pact penalty, or fea­tures for mild op­ti­miza­tion. So the broom­stick could also get \(\geq 0.90\) ex­pected util­ity by flood­ing the whole work­shop; we haven’t yet for­bid­den ex­cess efforts. Similarly, the broom­stick could also go on to de­stroy the world af­ter 1pm—we haven’t yet for­bid­den ex­cess im­pacts.

But the un­der­ly­ing rule of “Ex­e­cute a policy that fills the cauldron at least 90% full with at least 95% prob­a­bil­ity” does ap­pear task­ish, so far as we know. It seems pos­si­ble for an oth­er­wise well-de­signed agent to ex­e­cute this goal to the great­est achiev­able de­gree, by act­ing in bounded space, over a bounded time, with a limited amount of effort. There does not ap­pear to be a se­quence of poli­cies the agent would eval­u­ate as bet­ter fulfilling its de­ci­sion crite­rion, which use suc­ces­sively more and more effort.

The “task­ness” of this goal, even as­sum­ing it was cor­rectly iden­ti­fied, wouldn’t by it­self make the broom­stick a fully task­ish AGI. We also have to con­sider whether ev­ery sub­pro­cess of the AI is similarly tasky; whether there is any sub­pro­cess any­where in the AI that tries to im­prove mem­ory effi­ciency ‘as far as pos­si­ble’. But it would be a start, and make fur­ther safety fea­tures more fea­si­ble/​use­ful.

See also Mild op­ti­miza­tion as an open prob­lem in AGI al­ign­ment.

Parents:

  • Task-directed AGI

    An ad­vanced AI that’s meant to pur­sue a se­ries of limited-scope goals given it by the user. In Bostrom’s ter­minol­ogy, a Ge­nie.