Actual effectiveness

For a de­sign fea­ture of a suffi­ciently ad­vanced AI to be “ac­tu­ally effec­tive”, we may need to worry about the be­hav­ior of other parts of the sys­tem. For ex­am­ple, if you try to de­clare that a self-mod­ify­ing AI is not al­lowed to mod­ify the rep­re­sen­ta­tion of its util­ity func­tion, noteWhich shouldn’t be nec­es­sary in the first place, un­less some­thing weird is go­ing on. this con­stant sec­tion of code may be mean­ingless un­less you’re also en­forc­ing some in­var­i­ant on the prob­a­bil­ities that get mul­ti­plied by the util­ities and any other el­e­ment of the AI that can di­rectly poke poli­cies on their way to mo­tor out­put. Other­wise, the code and rep­re­sen­ta­tion of the util­ity func­tion may still be there, but it may not be ac­tu­ally steer­ing the AI the way it used to.


  • Advanced safety

    An agent is re­ally safe when it has the ca­pac­ity to do any­thing, but chooses to do what the pro­gram­mer wants.