Niceness is the first line of defense

The first line of defense in con­struct­ing any suffi­ciently ad­vanced Ar­tifi­cial Gen­eral In­tel­li­gence is build­ing an AI that does not want to hurt you. Any other mea­sures, like AI-box­ing or try­ing to pre­vent the AI from ac­cess­ing the In­ter­net, should be thought of only as back­stops in case this first line of defense fails. When de­sign­ing the AGI we should first think as if all these op­po­si­tional mea­sures don’t ex­ist, so that we aren’t dis­tracted while try­ing to en­vi­sion an AGI that—re­gard­less of how much power it has—will not want to hurt us.

See also the non-ad­ver­sar­ial prin­ci­ple and the dis­tinc­tion be­tween Direct­ing, vs. limit­ing, vs. op­pos­ing.


  • Non-adversarial principle

    At no point in con­struct­ing an Ar­tifi­cial Gen­eral In­tel­li­gence should we con­struct a com­pu­ta­tion that tries to hurt us, and then try to stop it from hurt­ing us.