‘Benefi­cial’ is a re­served term in AI al­ign­ment the­ory, a speaker-de­pen­dent vari­able to de­note what­ever the speaker means by, e.g., “nor­ma­tive” or “re­ally ac­tu­ally good” or “the out­comes I want to see re­sult­ing”. If the speaker uses ex­trap­o­lated vo­li­tion as their metaeth­i­cal the­ory, then ‘benefi­cial’ would mean ‘good ac­cord­ing to the speaker’s ex­trap­o­lated vo­li­tion’. Some­one who doesn’t yet have a de­vel­oped metaeth­i­cal the­ory would be us­ing ‘benefi­cial’ to mean “What­ever it is I ideally ought to want, what­ever the heck ‘ideally ought to want’ means.” To sup­pose that an event, agent, or policy was ‘benefi­cial’ is to sup­pose that it was re­ally ac­tu­ally good with no mis­takes in eval­u­at­ing good­ness. AI al­ign­ment the­ory some­times needs a word for this, be­cause when we talk about the difficulty of mak­ing an AI benefi­cial, we may want to talk about the difficulty of mak­ing it ac­tu­ally good and not just fool­ing us into think­ing that it’s good. See ‘value’ for more de­tails.


  • Value

    The word ‘value’ in the phrase ‘value al­ign­ment’ is a meta­syn­tac­tic vari­able that in­di­cates the speaker’s fu­ture goals for in­tel­li­gent life.