One of the potential views on ‘value’ in the value alignment problem is that what we should want from an AI is a list of immediate goods or outcome features like ‘a cure for cancer’ or ‘letting humans make their own decisions’ or ‘preventing the world from being wiped out by a paperclip maximizer’. (Immediate Goods as a criterion of ‘value’ isn’t the same as saying we should give the AI those explicit goals; calling such a list ‘value’ means it’s the real criterion by which we should judge how well the AI did.)
Immaturity of view deduced from presence of instrumental goods
It seems understandable that Immediate Goods would be a very common form of expressed want when people first consider the value alignment problem; they would look for valuable things an AI could do.
But such a quickly produced list of expressed wants will often includerather than . For example, a cancer cure is (presumably) a means to the end of healthier or happier humans, which would then be the actual grounds on which the AI’s real-world ‘value’ was evaluated from the human speaker’s standpoint. If the AI ‘cured cancer’ in some technical sense that didn’t make people healthier, the original person making the wish would probably not see the AI as having achieved value.
This is a reason for suspecting the maturity of such expressed views, and to suspect that the stated list of immediate goods will probably evolve into a moreview of value from a human standpoint, given further reflection.
Mootness of immaturity
Irrespective of the above, so far as technical issues like Edge Instantiation are concerned, the ‘value’ variable could still apply to someone’s spontaneously produced list of immediate wants, and that all the standard consequences of the value alignment problem usually still apply. It means we can immediately say (honestly) that e.g. Edge Instantiation would be a problem for whatever want the speaker just expressed, without needing to persuade them to some other stance on ‘value’ first. Since the same technical problems will apply both to the immature view and to the expected mature view, we don’t need to dispute the view of ‘value’ in order to take it at face value and honestly explain the standard technical issues that would still apply.
Moral imposition of short horizons
Arguably, a list of immediate goods may make some sense as a stopping-place for evaluating the performance of the AI, if either of the following conditions obtain:
There is much more agreement (among project sponsors or humans generally) about the goodness of the instrumental goods, than there is about the terminal values that make them good. E.g., twenty project sponsors can all agree that freedom is good, but have nonoverlapping concepts about why it is good, and it is hypothetically the case that these people would continue to disagree in the limit of indefinite debate or reflection. Then if we want to collectivize ‘value’ from the standpoint of the project sponsors for purposes of talking about whether the AI methodology achieves ‘value’, maybe it would just make sense to talk about how much (intuitively evaluated) freedom the AI creates.
It is in some sense morally incumbent upon humanity to do its own thinking about long-term outcomes and achieve them through immediate goods, or it is in some sense morally incumbent for humanity to arrive at long-term outcomes via its own decisions or optimization starting from immediate goods. In this case, it might make sense to see the ‘value’ of the AI as being realized only in terms of the AI getting to those immediate goods, because it would be morally wrong for there to be optimization by the AI of consequences beyond that.
To the knowledge of Eliezer Yudkowsky as of May 2015, neither of these views have yet been advocated by anyone in particular as a defense of an immediate-goods theory of value.
The word ‘value’ in the phrase ‘value alignment’ is a metasyntactic variable that indicates the speaker’s future goals for intelligent life.