Shutdown problem
The ‘shutdown problem’ is creating a sufficiently advanced Artificial Intelligence which will, on the press of a button, suspend itself safely to disk; such that the AI does not object to the existence of this button, nor try to prevent it from being pressed.
This is difficult because avoiding your own halt or suspension is a convergent instrumental strategy; even a goal like “bring the user coffee” implies avoiding shutdown.
One possible research avenue is to:
Describe a ‘shutdown utility function’ whose attainable optimum is “Suspend safely to disk (and possibly safely abort plans in progress), without building a hidden environmental copy of yourself to carry out your previous goal and then assimilate all matter in the universe to make absolutely sure the original AI stays shut down”.
Find a sensible way to compose the shutdown utility function with the agent’s regular utility function, such that:
(2a) Which utility function the agent optimizes depends on whether a switch was pressed.
(2b) The AI experiences no incentive to cause the switch to be pressed or prevent the switch from being pressed.
Harder versions of this problem would add specifications such as:
The AI positively wants the switch to exist and will restore the switch’s existence if it is damaged.
The AI will safely abort any plans in progress (and design plans such that they can be safely aborted).
See also Utility indifference, Shutdown utility function, Corrigibility, interruptibility, Low impact, and Abortable plans.
Children:
- You can't get the coffee if you're dead
An AI given the goal of ‘get the coffee’ can’t achieve that goal if it has been turned off; so even an AI whose goal is just to fetch the coffee may try to avert a shutdown button being pressed.
Parents:
- Corrigibility
“I can’t let you do that, Dave.”