The ‘shutdown problem’ is creating a sufficiently advanced Artificial Intelligence which will, on the press of a button, suspend itself safely to disk; such that the AI does not object to the existence of this button, nor try to prevent it from being pressed.
This is difficult because avoiding your own halt or suspension is a convergent instrumental strategy; even a goal like “bring the user coffee” implies avoiding shutdown.
One possible research avenue is to:
Describe a ‘shutdown utility function’ whose attainable optimum is “Suspend safely to disk (and possibly safely abort plans in progress), without building a hidden environmental copy of yourself to carry out your previous goal and then assimilate all matter in the universe to make absolutely sure the original AI stays shut down”.
Find a sensible way to compose the shutdown utility function with the agent’s regular utility function, such that:
(2a) Which utility function the agent optimizes depends on whether a switch was pressed.
(2b) The AI experiences no incentive to cause the switch to be pressed or prevent the switch from being pressed.
Harder versions of this problem would add specifications such as:
The AIthe switch to exist and will restore the switch’s existence if it is damaged.
The AI will safely abort any plans in progress (and design plans such that they can be safely aborted).