Definition. An “intended goal” refers to the intuitive intention in the mind of a human programmer when they executed some formal directive or goal within the AI. For example, if the programmer wants to create worthwhile happiness and the AI ends up tiling the universe with tiny molecular smiley-faces, we would say that worthwhile happiness (in some intuitive, possibly pre-verbal sense existing in the programmer’s mind) was the “intended goal”, as distinct from the result of the formal utility function actually encoded in the AI (which proved to have a maximum at tiny molecular smiley-faces).
- AI alignment
The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.