Niceness is the first line of defense

The first line of defense in constructing any sufficiently advanced Artificial General Intelligence is building an AI that does not want to hurt you. Any other measures, like AI-boxing or trying to prevent the AI from accessing the Internet, should be thought of only as backstops in case this first line of defense fails. When designing the AGI we should first think as if all these oppositional measures don’t exist, so that we aren’t distracted while trying to envision an AGI that—regardless of how much power it has—will not want to hurt us.

See also the non-adversarial principle and the distinction between Directing, vs. limiting, vs. opposing.

Parents:

  • Non-adversarial principle

    At no point in constructing an Artificial General Intelligence should we construct a computation that tries to hurt us, and then try to stop it from hurting us.