Shutdown problem

The ‘shut­down prob­lem’ is cre­at­ing a suffi­ciently ad­vanced Ar­tifi­cial In­tel­li­gence which will, on the press of a but­ton, sus­pend it­self safely to disk; such that the AI does not ob­ject to the ex­is­tence of this but­ton, nor try to pre­vent it from be­ing pressed.

This is difficult be­cause avoid­ing your own halt or sus­pen­sion is a con­ver­gent in­stru­men­tal strat­egy; even a goal like “bring the user coffee” im­plies avoid­ing shut­down.

One pos­si­ble re­search av­enue is to:

  1. De­scribe a ‘shut­down util­ity func­tion’ whose at­tain­able op­ti­mum is “Sus­pend safely to disk (and pos­si­bly safely abort plans in progress), with­out build­ing a hid­den en­vi­ron­men­tal copy of your­self to carry out your pre­vi­ous goal and then as­similate all mat­ter in the uni­verse to make ab­solutely sure the origi­nal AI stays shut down”.

  2. Find a sen­si­ble way to com­pose the shut­down util­ity func­tion with the agent’s reg­u­lar util­ity func­tion, such that:

    • (2a) Which util­ity func­tion the agent op­ti­mizes de­pends on whether a switch was pressed.

    • (2b) The AI ex­pe­riences no in­cen­tive to cause the switch to be pressed or pre­vent the switch from be­ing pressed.

Harder ver­sions of this prob­lem would add speci­fi­ca­tions such as:

  • The AI pos­i­tively wants the switch to ex­ist and will re­store the switch’s ex­is­tence if it is dam­aged.

  • The AI will safely abort any plans in progress (and de­sign plans such that they can be safely aborted).

See also Utility in­differ­ence, Shut­down util­ity func­tion, Cor­rigi­bil­ity, in­ter­rupt­ibil­ity, Low im­pact, and Abortable plans.


  • You can't get the coffee if you're dead

    An AI given the goal of ‘get the coffee’ can’t achieve that goal if it has been turned off; so even an AI whose goal is just to fetch the coffee may try to avert a shut­down but­ton be­ing pressed.