For a design feature of a sufficiently advanced AI to be “actually effective”, we may need to worry about the behavior of other parts of the system. For example, if you try to declare that a self-modifying AI is not allowed to modify the representation of its utility function, noteWhich shouldn’t be necessary in the first place, unless something weird is going on. this constant section of code may be meaningless unless you’re also enforcing some invariant on the probabilities that get multiplied by the utilities and any other element of the AI that can directly poke policies on their way to motor output. Otherwise, the code and representation of the utility function may still be there, but it may not be actually steering the AI the way it used to.
- Advanced safety
An agent is really safe when it has the capacity to do anything, but chooses to do what the programmer wants.