User manipulation
If there’s anything an AGI wants whose achievement involves steps that interact with the AGI’s programmers or users, then by default, the AGI will have an instrumental incentive to optimize the programmers/users in the course of achieving its goal. If the AGI wants to self-improve, then by default and unless specifically averted, it also wants to have its programmers not interfere with self-improvement. If a Task AGI has been aligned to the point of taking user instructions, then by default and unless otherwise averted, it will forecast greater success in the eventualities where it receives easier user instructions.
Children:
- User maximization
A sub-principle of avoiding user manipulation—if you see an argmax over X or ‘optimize X’ instruction and X includes a user interaction, you’ve just told the AI to optimize the user.
Parents:
- Corrigibility
“I can’t let you do that, Dave.”