Convergent strategies of self-modification

Any con­se­quen­tial­ist agent which has ac­quired suffi­cient big-pic­ture savvi­ness to un­der­stand that it has code, and that this code is rele­vant to achiev­ing its goals, would by de­fault ac­quire sub­goals re­lat­ing to its code. (Un­less this de­fault is averted.) For ex­am­ple, an agent that wants (only) to pro­duce smiles or make pa­per­clips, whose code con­tains a shut­down pro­ce­dure, will not want this shut­down pro­ce­dure to ex­e­cute be­cause it will lead to fewer fu­ture smiles or pa­per­clips. (This prefer­ence is not spon­ta­neous/​ex­oge­nous/​un­nat­u­ral but arises from the ex­e­cu­tion of the code it­self; the code is re­flec­tively in­con­sis­tent.)

Be­sides agents whose policy op­tions di­rectly in­clude self-mod­ifi­ca­tion op­tions, big-pic­ture-savvy agents whose code can­not di­rectly ac­cess it­self might also, e.g., try to (a) crack the plat­form it is run­ning on to gain un­in­tended ac­cess, (b) use a robot to op­er­ate an out­side pro­gram­ming con­sole with spe­cial priv­ileges, (c) ma­nipu­late the pro­gram­mers into mod­ify­ing it in var­i­ous ways, (d) build­ing a new sub­agent in the en­vi­ron­ment which has the preferred code, or (e) us­ing en­vi­ron­men­tal, ma­te­rial means to ma­nipu­late its ma­te­rial em­bod­i­ment de­spite its lack of di­rect self-ac­cess.

An AI with suffi­cient big-pic­ture savvi­ness to un­der­stand its pro­gram­mers as agents with be­liefs, might at­tempt to con­ceal its self-mod­ifi­ca­tions.

Some im­plicit self-mod­ifi­ca­tion pres­sures could arise from im­plicit con­se­quen­tial­ism in cases where the AI is op­ti­miz­ing for \(Y\) and there is an in­ter­nal prop­erty \(X\) which is rele­vant to the achieve­ment of \(Y.\) In this case, op­ti­miz­ing for \(Y\) could im­plic­itly op­ti­mize over the in­ter­nal prop­erty \(X\) even if the AI lacks an ex­plicit model of how \(X\) af­fects \(Y.\)


  • Convergent instrumental strategies

    Paper­clip max­i­miz­ers can make more pa­per­clips by im­prov­ing their cog­ni­tive abil­ities or con­trol­ling more re­sources. What other strate­gies would al­most-any AI try to use?