User maximization

A sub-prin­ci­ple of avoid­ing user ma­nipu­la­tion—if you’ve for­mu­lated the AI in terms of an argmax over \(X\) or an “op­ti­mize \(X\)” in­struc­tion, and \(X\) in­cludes a user in­ter­ac­tion as a sub­part, then you’ve just told the AI to op­ti­mize the user. For ex­am­ple, let’s say that the AI’s crite­rion of ac­tion is “Choose a policy \(p\) by max­i­miz­ing/​op­ti­miz­ing the prob­a­bil­ity \(X\) that the user’s in­struc­tions are car­ried out.” If the user’s in­struc­tions are a vari­able in­side the for­mula for \(X\) rather than a con­stant out­side it, you’ve just told the AI to try and get the user to give it eas­ier in­struc­tions.


  • User manipulation

    If not oth­er­wise averted, many of an AGI’s de­sired out­comes are likely to in­ter­act with users and hence im­ply an in­cen­tive to ma­nipu­late users.