Shutdown utility function

A special case of low impact which probably seems deceptively trivial—how would you create a utility function such that an agent with this utility function would harmlessly shut down? Without, for example, creating an environmental subagent that assimilated all matter in the universe and used it to make absolutely sure that the AI stayed shut down forever and wasn’t accidentally reactivated by some remote probability? If we had a shutdown utility function, and a safe button that switched between utility functions in a reflectively stable way, we could combine these two features to create an AI that had a safe shutdown button.

Better yet would be an abort utility function which incentivizes the safe aborting of all previous plans and actions in a low-impact way, and, say, suspending the AI itself to disk in a way that preserved its log files; if we had this utility function plus a safe button that switched to it, we could safely abort the AI’s current actions at any time. (This, however, would be more difficult, and it seems wise to work on just the shutdown utility function first.)

To avoid a rock trivially fulfilling this desideratum, we should add the requirement that (1) the shutdown utility function be something that produces “just switch yourself off and do nothing else” behavior in a generally intelligent agent, which if instead hooked up to a paperclip utility function, would be producing paperclips; and that the shutdown function should be omni-safe (the AI safely shuts down even if it has all other outcomes available as primitive actions).

“All outcomes have equal utility” would not be a shutdown utility function since in this case the actual action produced will be undefined under most forms of unbounded analysis—in essence, the AI’s internal systems would continue under their own inertia and produce some kind of undefined behavior which might well be coherent and harmful. We need a utility function that identifies harmless behavior, rather than failing to identify anything and producing undefined behavior.


  • Low impact

    The open problem of having an AI carry out tasks in ways that cause minimum side effects and change as little of the rest of the universe as possible.