Natural language understanding of "right" will yield normativity

This proposition is true if you can take a cognitively powerful agent that otherwise seems pretty competent at understanding natural language, and that has been previously trained out of infrahuman errors in understanding natural language, ask it to ‘do the right thing’ or ‘do the right thing, defined the right way’ and its natural language understanding of ‘right’ yields what we would intuitively see as normativity.



Natural categories have boundaries with low algorithmic information relative to boundaries produced by a purely epistemic system with a simplicity prior.

‘Unnatural’ categories have value-laden boundaries. Values have high algorithmic information because of the Orthogonality Thesis and Complexity of value. Unnatural categories appear simple to us because we do dimensional reduction on value boundaries. Things merely near to the boundaries of unnatural categories can fall off rapidly in value because of fragility.

There’s an inductive problem where 18 things are important and only 17 of them vary between the positive and negative examples in the data.

Edge instantiation makes this worse because it tends to seek out extreme cases.

The word ‘right’ involves a lot of what we call ‘philosophical competence’ in the sense that humans figuring it out will go through a lot of new cognitive use-paths (‘unprecedented excursions’) that they didn’t traverse while disambiguating blue and green. This also holds true when people are reflecting on how to figure out ‘right’. Example case of CDT vs. UDT.

This also matters because edge instantiation on the most ‘right’ as persuasively-right cases, will produce things that humans find superpersuasive (perhaps via shoving brains onto strange new pathways). So we can’t define right as that which would counterfactually cause a human model to agree that ‘right’ applies.

This keys into the inductive problem where variation must be shadowed in the data for the induced concept to cover it.

But if you had a complete predictive model of a human, it’s then possible though not necessary that normative boundaries might be possible to induce by examples and asking to clarify ambiguities.


  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.