Shutdown utility function

A spe­cial case of low im­pact which prob­a­bly seems de­cep­tively triv­ial—how would you cre­ate a util­ity func­tion such that an agent with this util­ity func­tion would harm­lessly shut down? Without, for ex­am­ple, cre­at­ing an en­vi­ron­men­tal sub­agent that as­similated all mat­ter in the uni­verse and used it to make ab­solutely sure that the AI stayed shut down for­ever and wasn’t ac­ci­den­tally re­ac­ti­vated by some re­mote prob­a­bil­ity? If we had a shut­down util­ity func­tion, and a safe but­ton that switched be­tween util­ity func­tions in a re­flec­tively sta­ble way, we could com­bine these two fea­tures to cre­ate an AI that had a safe shut­down but­ton.

Bet­ter yet would be an abort util­ity func­tion which in­cen­tivizes the safe abort­ing of all pre­vi­ous plans and ac­tions in a low-im­pact way, and, say, sus­pend­ing the AI it­self to disk in a way that pre­served its log files; if we had this util­ity func­tion plus a safe but­ton that switched to it, we could safely abort the AI’s cur­rent ac­tions at any time. (This, how­ever, would be more difficult, and it seems wise to work on just the shut­down util­ity func­tion first.)

To avoid a rock triv­ially fulfilling this desider­a­tum, we should add the re­quire­ment that (1) the shut­down util­ity func­tion be some­thing that pro­duces “just switch your­self off and do noth­ing else” be­hav­ior in a gen­er­ally in­tel­li­gent agent, which if in­stead hooked up to a pa­per­clip util­ity func­tion, would be pro­duc­ing pa­per­clips; and that the shut­down func­tion should be omni-safe (the AI safely shuts down even if it has all other out­comes available as prim­i­tive ac­tions).

“All out­comes have equal util­ity” would not be a shut­down util­ity func­tion since in this case the ac­tual ac­tion pro­duced will be un­defined un­der most forms of un­bounded anal­y­sis—in essence, the AI’s in­ter­nal sys­tems would con­tinue un­der their own in­er­tia and pro­duce some kind of un­defined be­hav­ior which might well be co­her­ent and harm­ful. We need a util­ity func­tion that iden­ti­fies harm­less be­hav­ior, rather than failing to iden­tify any­thing and pro­duc­ing un­defined be­hav­ior.


  • Low impact

    The open prob­lem of hav­ing an AI carry out tasks in ways that cause min­i­mum side effects and change as lit­tle of the rest of the uni­verse as pos­si­ble.