User manipulation

If there’s any­thing an AGI wants whose achieve­ment in­volves steps that in­ter­act with the AGI’s pro­gram­mers or users, then by de­fault, the AGI will have an in­stru­men­tal in­cen­tive to op­ti­mize the pro­gram­mers/​users in the course of achiev­ing its goal. If the AGI wants to self-im­prove, then by de­fault and un­less speci­fi­cally averted, it also wants to have its pro­gram­mers not in­terfere with self-im­prove­ment. If a Task AGI has been al­igned to the point of tak­ing user in­struc­tions, then by de­fault and un­less oth­er­wise averted, it will fore­cast greater suc­cess in the even­tu­al­ities where it re­ceives eas­ier user in­struc­tions.


  • User maximization

    A sub-prin­ci­ple of avoid­ing user ma­nipu­la­tion—if you see an argmax over X or ‘op­ti­mize X’ in­struc­tion and X in­cludes a user in­ter­ac­tion, you’ve just told the AI to op­ti­mize the user.