User maximization

A sub-principle of avoiding user manipulation—if you’ve formulated the AI in terms of an argmax over \(X\) or an “optimize \(X\)” instruction, and \(X\) includes a user interaction as a subpart, then you’ve just told the AI to optimize the user. For example, let’s say that the AI’s criterion of action is “Choose a policy \(p\) by maximizing/​optimizing the probability \(X\) that the user’s instructions are carried out.” If the user’s instructions are a variable inside the formula for \(X\) rather than a constant outside it, you’ve just told the AI to try and get the user to give it easier instructions.


  • User manipulation

    If not otherwise averted, many of an AGI’s desired outcomes are likely to interact with users and hence imply an incentive to manipulate users.