Identifying causal goal concepts from sensory data

Sup­pose we want an AI to carry out some goals in­volv­ing straw­ber­ries, and as a re­sult, we want to iden­tify to the AI the con­cept of “straw­berry”. One of the po­ten­tial ways we could do this is by show­ing the AI ob­jects that a teacher clas­sifies as straw­ber­ries or non-straw­ber­ries. How­ever, in the course of do­ing this, what the AI ac­tu­ally sees will be e.g. a pat­tern of pix­els on a we­b­cam—the ac­tual, phys­i­cal straw­berry is not di­rectly ac­cessible to the AI’s in­tel­li­gence. When we show the AI a straw­berry, what we’re re­ally try­ing to com­mu­ni­cate is “A cer­tain prox­i­mal cause of this sen­sory data is a straw­berry”, not, “This ar­range­ment of sen­sory pix­els is a straw­berry.” An AI that learns the lat­ter con­cept might try to carry out its goal by putting a pic­ture in front of its we­b­cam; the former AI has a goal that ac­tu­ally in­volves some­thing in its en­vi­ron­ment.

The open prob­lem of “iden­ti­fy­ing causal goal con­cepts from sen­sory data” or “iden­ti­fy­ing en­vi­ron­men­tal con­cepts from sen­sory data” is about get­ting an AI to form causal goal con­cepts in­stead of sen­sory goal con­cepts. Since al­most no hu­man-in­tended goal will ever be satis­fi­able solely in virtue of an ad­vanced agent ar­rang­ing to see a cer­tain field of pix­els, safe ways of iden­ti­fy­ing goals to suffi­ciently ad­vanced goal-based agents will pre­sum­ably in­volve some way of iden­ti­fy­ing goals among the causes of sense data.

A “toy” (and still pretty difficult) ver­sion of this open prob­lem might be to ex­hibit a ma­chine al­gorithm that (a) has a causal model of its en­vi­ron­ment, (b) can learn con­cepts over any level of its causal model in­clud­ing sense data, (c) can learn and pur­sue a goal con­cept, (d) has the po­ten­tial abil­ity to spoof its own senses or cre­ate fake ver­sions of ob­jects, and (e) is shown to learn a prox­i­mal causal goal rather than a goal about sen­sory data as shown by it pur­su­ing only the causal ver­sion of that goal even if it would have the op­tion to spoof it­self.

For a more elab­o­rated ver­sion of this open prob­lem, see “Look where I’m point­ing, not at my finger”.