Identifying causal goal concepts from sensory data

Suppose we want an AI to carry out some goals involving strawberries, and as a result, we want to identify to the AI the concept of “strawberry”. One of the potential ways we could do this is by showing the AI objects that a teacher classifies as strawberries or non-strawberries. However, in the course of doing this, what the AI actually sees will be e.g. a pattern of pixels on a webcam—the actual, physical strawberry is not directly accessible to the AI’s intelligence. When we show the AI a strawberry, what we’re really trying to communicate is “A certain proximal cause of this sensory data is a strawberry”, not, “This arrangement of sensory pixels is a strawberry.” An AI that learns the latter concept might try to carry out its goal by putting a picture in front of its webcam; the former AI has a goal that actually involves something in its environment.

The open problem of “identifying causal goal concepts from sensory data” or “identifying environmental concepts from sensory data” is about getting an AI to form causal goal concepts instead of sensory goal concepts. Since almost no human-intended goal will ever be satisfiable solely in virtue of an advanced agent arranging to see a certain field of pixels, safe ways of identifying goals to sufficiently advanced goal-based agents will presumably involve some way of identifying goals among the causes of sense data.

A “toy” (and still pretty difficult) version of this open problem might be to exhibit a machine algorithm that (a) has a causal model of its environment, (b) can learn concepts over any level of its causal model including sense data, (c) can learn and pursue a goal concept, (d) has the potential ability to spoof its own senses or create fake versions of objects, and (e) is shown to learn a proximal causal goal rather than a goal about sensory data as shown by it pursuing only the causal version of that goal even if it would have the option to spoof itself.

For a more elaborated version of this open problem, see “Look where I’m pointing, not at my finger”.