Underestimating complexity of value because goodness feels like a simple property

One potential reason why people might tend to systematically underestimate the complexity of value is if the “goodness” of a policy or goal-instantiation feels like a simple, direct property. That is, our brains compute the goodness level and make it available to us as a relatively simple quantity, so we feel like it’s a simple fact that tiling the universe with tiny agents experiencing maximum simply-represented ‘pleasure’ levels, is a bad version of happiness. We feel like it ought to be simple to yell at an AI “Just give me high-value happiness, not this weird low-value happiness!” Or have the AI learn, from a few examples, that it’s meant to produce high-value X and not low-value X, especially if the AI is smart enough to learn other simple boundaries, like the difference between red objects and blue objects. Where actually the boundary between “good X” and “bad X” is value-laden and far more wiggly and would require far more examples to delineate. What our brain computes as a seemingly simple, perceptually available one-dimensional quantity, does not always correspond to a simple, easy-to-learn gradient in the space of policies or outcomes. This is especially true of the seemingly readily-available property of beneficialness.


  • Complexity of value

    There’s no simple way to describe the goals we want Artificial Intelligences to want.