Natural language understanding of "right" will yield normativity

This propo­si­tion is true if you can take a cog­ni­tively pow­er­ful agent that oth­er­wise seems pretty com­pe­tent at un­der­stand­ing nat­u­ral lan­guage, and that has been pre­vi­ously trained out of in­frahu­man er­rors in un­der­stand­ing nat­u­ral lan­guage, ask it to ‘do the right thing’ or ‘do the right thing, defined the right way’ and its nat­u­ral lan­guage un­der­stand­ing of ‘right’ yields what we would in­tu­itively see as nor­ma­tivity.



Nat­u­ral cat­e­gories have bound­aries with low al­gorith­mic in­for­ma­tion rel­a­tive to bound­aries pro­duced by a purely epistemic sys­tem with a sim­plic­ity prior.

‘Un­nat­u­ral’ cat­e­gories have value-laden bound­aries. Values have high al­gorith­mic in­for­ma­tion be­cause of the Orthog­o­nal­ity Th­e­sis and Com­plex­ity of value. Un­nat­u­ral cat­e­gories ap­pear sim­ple to us be­cause we do di­men­sional re­duc­tion on value bound­aries. Things merely near to the bound­aries of un­nat­u­ral cat­e­gories can fall off rapidly in value be­cause of frag­ility.

There’s an in­duc­tive prob­lem where 18 things are im­por­tant and only 17 of them vary be­tween the pos­i­tive and nega­tive ex­am­ples in the data.

Edge in­stan­ti­a­tion makes this worse be­cause it tends to seek out ex­treme cases.

The word ‘right’ in­volves a lot of what we call ‘philo­soph­i­cal com­pe­tence’ in the sense that hu­mans figur­ing it out will go through a lot of new cog­ni­tive use-paths (‘un­prece­dented ex­cur­sions’) that they didn’t tra­verse while dis­am­biguat­ing blue and green. This also holds true when peo­ple are re­flect­ing on how to figure out ‘right’. Ex­am­ple case of CDT vs. UDT.

This also mat­ters be­cause edge in­stan­ti­a­tion on the most ‘right’ as per­sua­sively-right cases, will pro­duce things that hu­mans find su­per­per­sua­sive (per­haps via shov­ing brains onto strange new path­ways). So we can’t define right as that which would coun­ter­fac­tu­ally cause a hu­man model to agree that ‘right’ ap­plies.

This keys into the in­duc­tive prob­lem where vari­a­tion must be shad­owed in the data for the in­duced con­cept to cover it.

But if you had a com­plete pre­dic­tive model of a hu­man, it’s then pos­si­ble though not nec­es­sary that nor­ma­tive bound­aries might be pos­si­ble to in­duce by ex­am­ples and ask­ing to clar­ify am­bi­gui­ties.


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.