‘Beneficial’ is a reserved term in AI alignment theory, a speaker-dependent variable to denote whatever the speaker means by, e.g., “normative” or “really actually good” or “the outcomes I want to see resulting”. If the speaker uses extrapolated volition as their metaethical theory, then ‘beneficial’ would mean ‘good according to the speaker’s extrapolated volition’. Someone who doesn’t yet have a developed metaethical theory would be using ‘beneficial’ to mean “Whatever it is I ideally ought to want, whatever the heck ‘ideally ought to want’ means.” To suppose that an event, agent, or policy was ‘beneficial’ is to suppose that it was really actually good with no mistakes in evaluating goodness. AI alignment theory sometimes needs a word for this, because when we talk about the difficulty of making an AI beneficial, we may want to talk about the difficulty of making it actually good and not just fooling us into thinking that it’s good. See ‘value’ for more details.


  • Value

    The word ‘value’ in the phrase ‘value alignment’ is a metasyntactic variable that indicates the speaker’s future goals for intelligent life.