Value identification problem

The sub­prob­lem cat­e­gory of value al­ign­ment which deals with pin­point­ing valuable out­comes to an ad­vanced agent and dis­t­in­guish­ing them from non-valuable out­comes. E.g., the Edge In­stan­ti­a­tion and On­tol­ogy Iden­ti­fi­ca­tion prob­lems are ar­gued to be fore­see­able difficul­ties of value iden­ti­fi­ca­tion. A cen­tral fore­seen difficulty of value iden­ti­fi­ca­tion is Com­plex­ity of Value.


  • Happiness maximizer
  • Edge instantiation

    When you ask the AI to make peo­ple happy, and it tiles the uni­verse with the small­est ob­jects that can be happy.

  • Identifying causal goal concepts from sensory data

    If the in­tended goal is “cure can­cer” and you show the AI healthy pa­tients, it sees, say, a pat­tern of pix­els on a we­b­cam. How do you get to a goal con­cept about the real pa­tients?

  • Goal-concept identification

    Figur­ing out how to say “straw­berry” to an AI that you want to bring you straw­ber­ries (and not fake plas­tic straw­ber­ries, ei­ther).

  • Ontology identification problem

    How do we link an agent’s util­ity func­tion to its model of the world, when we don’t know what that model will look like?

  • Environmental goals

    The prob­lem of hav­ing an AI want out­comes that are out in the world, not just want di­rect sense events.


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.