Task (AI goal)

A “Task” is a goal or subgoal within an advanced AI, that can be satisfied as fully as possible by optimizing a bounded part of space, for a limited time, with a limited amount of effort.

E.g., “make as many paperclips as possible” is definitely not a ‘task’ in this sense, since it spans every paperclip anywhere in space and future time. Creating more and more paperclips, using more and more effort, would be more and more preferable up to the maximum exertable effort.

For a more subtle example of non-taskishness, consider Disney’s “sorcerer’s apprentice” scenario: Mickey Mouse commands a broomstick to fill a cauldron. The broomstick then adds more and more water to the cauldron until the workshop is flooded. (Mickey then tries to destroy the broomstick. But since the broomstick has no designed-in reflectively stable shutdown button, the broomstick repairs itself and begins constructing subagents that go on pouring more water into the cauldron.)

Since the Disney cartoon is a musical, we don’t know if the broomstick was given a time bound on its job. Let us suppose that Mickey tells the broomstick to do its job sometime before 1pm.

Then we might imagine that the broomstick is a subjective expected utility maximizer with a utility function \(U_{cauldron}\) over outcomes \(o\):

$$U_{cauldron}(o): \begin{cases} 1 & \text{if in $o$ the cauldron is $\geq 90\%$ full of water at 1pm} \\ 0 & \text{otherwise} \end{cases}$$

This looks at first glance like it ought to be taskish:

  • The cauldron is bounded in space.

  • The goal only concerns events that happen before a certain time.

  • The highest utility that can be achieved is \(1,\) which is reached as soon as the cauldron is \(\geq 90\%\) full of water, which seems achievable using a limited amount of effort.

The last property in particular makes \(U_{cauldron}\) a “satisficing utility function”, one where an outcome is either satisfactory or not-satisfactory, and it is not possible to do any better than “satisfactory”.

But by previous assumption, the broomstick is still optimizing expected utility. Assume the broomstick reasons with reasonable generality via some universal prior. Then the subjective probability of the cauldron being full, when it looks full to the broomstick-agent, will not be exactly \(1.\) Perhaps (the broomstick-agent reasons) the broomstick’s cameras are malfunctioning, or its RAM has malfunctioned producing an inaccurate memory.

Then the broomstick-agent reasons that it can further increase the probability of the cauldron being full—however slight the increase in probability—by going ahead and dumping in another bucket of water.

That is: Cromwell’s Rule implies that the subjective probability of the bucket being full never reaches exactly \(1\). Then there can be an infinite series of increasingly preferred, increasingly more effortful policies \(\pi_1, \pi_2, \pi_3 \ldots\) with

$$\mathbb E [ U_{cauldron} | \pi_1] = 0.99\\ \mathbb E [ U_{cauldron} | \pi_2] = 0.999 \\ \mathbb E [ U_{cauldron} | \pi_3] = 0.999002 \\ \ldots$$

In that case the broomstick can always do better in expected utility (however slightly) by exerting even more effort, up to the maximum effort it can exert. Hence the flooded workshop.

If on the other hand the broomstick is an expected utility satisficer, i.e., a policy is “acceptable” if it has \(\mathbb E [ U_{cauldron} | \pi ] \geq 0.95,\) then this is now finally a taskish process (we think). The broomstick can find some policy that’s reasonably sure of filling up the cauldron, execute that policy, and then do no more.

As described, this broomstick doesn’t yet have any impact penalty, or features for mild optimization. So the broomstick could also get \(\geq 0.90\) expected utility by flooding the whole workshop; we haven’t yet forbidden excess efforts. Similarly, the broomstick could also go on to destroy the world after 1pm—we haven’t yet forbidden excess impacts.

But the underlying rule of “Execute a policy that fills the cauldron at least 90% full with at least 95% probability” does appear taskish, so far as we know. It seems possible for an otherwise well-designed agent to execute this goal to the greatest achievable degree, by acting in bounded space, over a bounded time, with a limited amount of effort. There does not appear to be a sequence of policies the agent would evaluate as better fulfilling its decision criterion, which use successively more and more effort.

The “taskness” of this goal, even assuming it was correctly identified, wouldn’t by itself make the broomstick a fully taskish AGI. We also have to consider whether every subprocess of the AI is similarly tasky; whether there is any subprocess anywhere in the AI that tries to improve memory efficiency ‘as far as possible’. But it would be a start, and make further safety features more feasible/​useful.

See also Mild optimization as an open problem in AGI alignment.

Parents:

  • Task-directed AGI

    An advanced AI that’s meant to pursue a series of limited-scope goals given it by the user. In Bostrom’s terminology, a Genie.