Intended goal

Defi­ni­tion. An “in­tended goal” refers to the in­tu­itive in­ten­tion in the mind of a hu­man pro­gram­mer when they ex­e­cuted some for­mal di­rec­tive or goal within the AI. For ex­am­ple, if the pro­gram­mer wants to cre­ate worth­while hap­piness and the AI ends up tiling the uni­verse with tiny molec­u­lar smiley-faces, we would say that worth­while hap­piness (in some in­tu­itive, pos­si­bly pre-ver­bal sense ex­ist­ing in the pro­gram­mer’s mind) was the “in­tended goal”, as dis­tinct from the re­sult of the for­mal util­ity func­tion ac­tu­ally en­coded in the AI (which proved to have a max­i­mum at tiny molec­u­lar smiley-faces).

Parents:

  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.