Value alignment problem

Disambiguation: For the research subject that includes the entire edifice of how and why to produce good AIs, see AI alignment.

The ‘value alignment problem’ is to produce sufficiently advanced machine intelligences that want to do beneficial things and not do harmful things. The largest-looming subproblem is ‘value identification’ or ‘value learning’ (sometimes considered synonymous with value alignment) but this also includes subproblems like Corrigibility, that is, AI values such that it doesn’t want to interfere with you correcting what you see as an error in its code.


  • Total alignment

    We say that an advanced AI is “totally aligned” when it knows exactly which outcomes and plans are beneficial, with no further user input.

  • Preference framework

    What’s the thing an agent uses to compare its preferences?


  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.