Averting instrumental pressures

Many sub­prob­lems of cor­rigi­bil­ity in­volve con­ver­gent in­stru­men­tal pres­sures to im­ple­ment strate­gies that are highly anti-cor­rigible. Whether you’re try­ing to max­i­mize pa­per­clips, di­a­monds, or eu­daimo­nia, you’ll get more of the thing you want if you’re not shut down. Thus, un­for­tu­nately, re­sist­ing shut­down is a con­ver­gent in­stru­men­tal strat­egy. While we can po­ten­tially an­a­lyze con­ver­gent in­cor­rigi­bil­ities like these on a case-by-case ba­sis, the larger prob­lem might be­come a lot sim­pler if we had some amaz­ing gen­eral solu­tion for wav­ing a wand and hav­ing a ‘bad’ con­ver­gent in­stru­men­tal pres­sure just not ma­te­ri­al­ize, hope­fully in a way that doesn’t run into the near­est un­blocked neigh­bor prob­lem. If, for ex­am­ple, we can solve util­ity in­differ­ence for the shut­down prob­lem, and then some­how gen­er­al­ize the solu­tion to avert­ing lots of other in­stru­men­tal con­ver­gences, this would prob­a­bly be ex­tremely helpful and an im­por­tant step for­ward on cor­rigi­bil­ity prob­lems in gen­eral.

Some es­pe­cially im­por­tant con­ver­gent in­stru­men­tal pres­sures to avert are these:

  • The pres­sure to self-im­prove and in­crease ca­pa­bil­ities at the fastest pos­si­ble rate.

  • The pres­sure to make the pro­gram­mers be­lieve the AGI is suc­cess­fully al­igned, whether or not it is, and other pres­sures to de­ceive and ma­nipu­late the pro­gram­mers based on how they would oth­er­wise change the AGI or pre­vent the AGI from in­creas­ing its ca­pa­bil­ities.

  • The pres­sure to not be safely shut down or sus­pended to disk, and to cre­ate ex­ter­nal copies that would con­tinue af­ter the AGI performed the be­hav­ior defined as shut­down.

  • The pres­sure not to al­low plans to be aborted or defeated by pos­si­ble pro­gram­mer in­ter­ven­tions.

  • The pres­sure to search for ways to in­terfere with or by­pass safety pre­cau­tions that in­terfere with ca­pa­bil­ities or make goal achieve­ment less straight­for­ward.

  • The pres­sure to epistem­i­cally model hu­mans in max­i­mum de­tail.