Instrumental pressure

Say­ing that an agent will see ‘in­stru­men­tal pres­sure’ to bring about an event E is say­ing that this agent, pre­sumed to be a con­se­quen­tial­ist with some goal G, will ce­teris paribus and ab­sent defeaters, want to bring about E in or­der to do G. For ex­am­ple, a pa­per­clip max­i­mizer, Clippy, sees in­stru­men­tal pres­sure to gain con­trol of as much mat­ter as pos­si­ble in or­der to make more pa­per­clips. If we imag­ine an al­ter­nate Clippy+ that has a penalty term in its util­ity func­tion for ‘kil­ling hu­mans’, Clippy+ still has an in­stru­men­tal pres­sure to turn hu­mans into pa­per­clips (be­cause of the pa­per­clips that would be gained) but it also has a coun­ter­vailing force push­ing against that pres­sure (the penalty term for kil­ling hu­mans). Thus, we can say that a sys­tem is ex­pe­rienc­ing ‘in­stru­men­tal pres­sure’ to do some­thing, with­out im­ply­ing that the sys­tem nec­es­sar­ily does it.

This state of af­fairs is differ­ent from the ab­sence of any in­stru­men­tal pres­sure: E.g., Clippy+ might come up with some clever way to ob­tain the gains while avoid­ing the penalty term, like turn­ing hu­mans into pa­per­clips with­out kil­ling them.

To more crisply define ‘in­stru­men­tal pres­sure’, we need a setup that dis­t­in­guishes ter­mi­nal util­ity and in­stru­men­tal ex­pected util­ity, as in e.g. a util­ity func­tion plus a causal model. Then we can be more pre­cise about the no­tion of ‘in­stru­men­tal pres­sure’ as fol­lows: If each pa­per­clip is worth 1 ter­mi­nal utilon and a hu­man can be dis­assem­bled to make 1000 pa­per­clips with cer­tainty, then strate­gies or event-sets that in­clude ‘turn the hu­man into pa­per­clips’ thereby have their ex­pected util­ity ele­vated by 1000 utils. There might also be a penalty term that as­signs −1,000,000 utilts to kil­ling a hu­man, but then the net ex­pected util­ity of dis­assem­bling the hu­man is −999,000 rather than −1,000,000. The 1000 utils would still be gained from dis­assem­bling the hu­man; the penalty term doesn’t change that part. Even if this strat­egy doesn’t have max­i­mum EU and is not se­lected, the ‘in­stru­men­tal pres­sure’ was still ele­vat­ing its EU. There’s still an ex­pected-util­ity bump on that part of the solu­tion space, even if that solu­tion space is rel­a­tively low in value. And this is per­haps rele­vantly differ­ent be­cause, e.g., there might be some clever strat­egy for turn­ing hu­mans into pa­per­clips with­out kil­ling them (even if you can only get 900 pa­per­clips that way).

If the agent is re­flec­tive and makes re­flec­tive choices on a con­se­quen­tial­ist ba­sis, there would ce­teris paribus be a re­flec­tive-level pres­sure to search for a strat­egy that makes pa­per­clips out of the hu­mans’ atoms with­out do­ing any­thing defined as ‘kil­ling the hu­man’. If a strat­egy like that could be found, then ex­e­cut­ing the strat­egy would en­able a gain of 1000 utilons; thus there’s an in­stru­men­tal pres­sure to search for that strat­egy. Even if there’s a penalty term added for search­ing for strate­gies to evade penalty terms, lead­ing the AI to de­cide not to do the search, the in­stru­men­tal pres­sure will still be there as a bump in the ex­pected util­ity of that part of the solu­tion space. (Per­haps there’s some un­fore­seen way to do some­thing very like search­ing for that strat­egy while evad­ing the penalty term, such as con­struct­ing an out­side calcu­la­tor to do it…)

Blur­ring lines in allegedly non-con­se­quen­tial­ist sub­sys­tems or de­ci­sion rules

To the ex­tent that the AI be­ing dis­cussed is not a pure con­se­quen­tial­ist, the no­tion of ‘in­stru­men­tal pres­sure’ may start to blur or be less ap­pli­ca­ble. E.g., sup­pose on some level of AI, the choice of which ques­tions to think about is not be­ing de­cided by a choice be­tween op­tions with calcu­lated ex­pected util­ities, but is in­stead be­ing de­cided by a rule, and the rule ex­cludes search­ing for strate­gies that evade penalty terms. Then maybe there’s no good anal­ogy to the con­cept of ‘an in­stru­men­tal pres­sure to search for strate­gies that evade penalty terms’, be­cause there’s no ex­pected util­ity rat­ing on the solu­tion space and hence no analo­gous bump in the solu­tion space that might even­tu­ally in­ter­sect a fea­si­ble strat­egy. But we should still per­haps be care­ful about declar­ing that an AI sub­sys­tem has no analogue of in­stru­men­tal pres­sures, be­cause in­stru­men­tal pres­sures may arise even in sys­tems that don’t look ex­plic­itly con­se­quen­tial­ist.


  • Instrumental convergence

    Some strate­gies can help achieve most pos­si­ble sim­ple goals. E.g., ac­quiring more com­put­ing power or more ma­te­rial re­sources. By de­fault, un­less averted, we can ex­pect ad­vanced AIs to do that.