Underestimating complexity of value because goodness feels like a simple property

One po­ten­tial rea­son why peo­ple might tend to sys­tem­at­i­cally un­der­es­ti­mate the com­plex­ity of value is if the “good­ness” of a policy or goal-in­stan­ti­a­tion feels like a sim­ple, di­rect prop­erty. That is, our brains com­pute the good­ness level and make it available to us as a rel­a­tively sim­ple quan­tity, so we feel like it’s a sim­ple fact that tiling the uni­verse with tiny agents ex­pe­rienc­ing max­i­mum sim­ply-rep­re­sented ‘plea­sure’ lev­els, is a bad ver­sion of hap­piness. We feel like it ought to be sim­ple to yell at an AI “Just give me high-value hap­piness, not this weird low-value hap­piness!” Or have the AI learn, from a few ex­am­ples, that it’s meant to pro­duce high-value X and not low-value X, es­pe­cially if the AI is smart enough to learn other sim­ple bound­aries, like the differ­ence be­tween red ob­jects and blue ob­jects. Where ac­tu­ally the bound­ary be­tween “good X” and “bad X” is value-laden and far more wig­gly and would re­quire far more ex­am­ples to delineate. What our brain com­putes as a seem­ingly sim­ple, per­cep­tu­ally available one-di­men­sional quan­tity, does not always cor­re­spond to a sim­ple, easy-to-learn gra­di­ent in the space of poli­cies or out­comes. This is es­pe­cially true of the seem­ingly read­ily-available prop­erty of benefi­cial­ness.


  • Complexity of value

    There’s no sim­ple way to de­scribe the goals we want Ar­tifi­cial In­tel­li­gences to want.