Value alignment problem
Disambiguation: For the research subject that includes the entire edifice of how and why to produce good AIs, see AI alignment.
The ‘value alignment problem’ is to produce sufficiently advanced machine intelligences that want to do beneficial things and not do harmful things. The largest-looming subproblem is ‘value identification’ or ‘value learning’ (sometimes considered synonymous with value alignment) but this also includes subproblems like Corrigibility, that is, AI values such that it doesn’t want to interfere with you correcting what you see as an error in its code.
Children:
- Total alignment
We say that an advanced AI is “totally aligned” when it knows exactly which outcomes and plans are beneficial, with no further user input.
- Preference framework
What’s the thing an agent uses to compare its preferences?
Parents:
- AI alignment
The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.