Abortable plans

“Abortable” plans are those which can read­ily be switched to hav­ing very low net im­pact on the world. Sup­pose an AI is told to paint a car pink. The AI starts to do so by con­struct­ing repli­cat­ing nanoma­chines that will paint the car pink. If the AI has suc­cess­fully been given a shut­down util­ity func­tion and shut­down but­ton, we can press the but­ton to have the AI switch off or sus­pend it­self to disk and take no fur­ther ac­tion, but this might not af­fect the nanoma­chines already made. An AI with an abort but­ton will have con­structed the nanoma­chines such that at any time the AI can be given the “abort” in­struc­tion, which will with a min­i­mum of fur­ther ac­tion on the AI’s part cause all the nanoma­chines (in­clud­ing any repli­cated ones) to quietly self-de­struct. That is, the AI has already planned such that the par­tial ex­e­cu­tion of the origi­nal plan, plus the ac­ti­va­tion mid­way of the abort sub­plan, will to­gether have min­i­mum im­pact on the world.

Parents:

  • Low impact

    The open prob­lem of hav­ing an AI carry out tasks in ways that cause min­i­mum side effects and change as lit­tle of the rest of the uni­verse as pos­si­ble.