Value identification problem

The subproblem category of value alignment which deals with pinpointing valuable outcomes to an advanced agent and distinguishing them from non-valuable outcomes. E.g., the Edge Instantiation and Ontology Identification problems are argued to be foreseeable difficulties of value identification. A central foreseen difficulty of value identification is Complexity of Value.


  • Happiness maximizer
  • Edge instantiation

    When you ask the AI to make people happy, and it tiles the universe with the smallest objects that can be happy.

  • Identifying causal goal concepts from sensory data

    If the intended goal is “cure cancer” and you show the AI healthy patients, it sees, say, a pattern of pixels on a webcam. How do you get to a goal concept about the real patients?

  • Goal-concept identification

    Figuring out how to say “strawberry” to an AI that you want to bring you strawberries (and not fake plastic strawberries, either).

  • Ontology identification problem

    How do we link an agent’s utility function to its model of the world, when we don’t know what that model will look like?

  • Environmental goals

    The problem of having an AI want outcomes that are out in the world, not just want direct sense events.


  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.