Value alignment problem

Disam­bigua­tion: For the re­search sub­ject that in­cludes the en­tire ed­ifice of how and why to pro­duce good AIs, see AI al­ign­ment.

The ‘value al­ign­ment prob­lem’ is to pro­duce suffi­ciently ad­vanced ma­chine in­tel­li­gences that want to do benefi­cial things and not do harm­ful things. The largest-loom­ing sub­prob­lem is ‘value iden­ti­fi­ca­tion’ or ‘value learn­ing’ (some­times con­sid­ered syn­ony­mous with value al­ign­ment) but this also in­cludes sub­prob­lems like Cor­rigi­bil­ity, that is, AI val­ues such that it doesn’t want to in­terfere with you cor­rect­ing what you see as an er­ror in its code.


  • Total alignment

    We say that an ad­vanced AI is “to­tally al­igned” when it knows ex­actly which out­comes and plans are benefi­cial, with no fur­ther user in­put.

  • Preference framework

    What’s the thing an agent uses to com­pare its prefer­ences?


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.