You can't get more paperclips that way

In­stru­men­tal con­ver­gence says that var­i­ous prop­er­ties \(P\) of an agent, of­ten scary or detri­men­tal-by-de­fault prop­er­ties like “try­ing to gain con­trol of lots of re­sources” or “de­ceiv­ing hu­mans into think­ing you are nice”, will fall out of pur­su­ing most util­ity func­tions \(U.\) You might be tempted to hope that nice or re­as­sur­ing prop­er­ties \(P\) would also fall out of most util­ity func­tions \(U\) in the same nat­u­ral way. In fact, your brain might tempted to treat Clippy the Paper­clip Max­i­mizer as a poli­ti­cal agent you were try­ing to clev­erly per­suade, and come up with clever ar­gu­ments for why Clippy should do things your way in or­der to get more pa­per­clips, like try­ing to per­suade your boss why you ought to get a raise for the good of the com­pany.

The prob­lem here is that:

  • Gen­er­ally, when you think of a nice policy \(\pi_1\) that pro­duces some pa­per­clips, there will be a non-nice policy \(\pi_2\) that pro­duces even more pa­per­clips.

  • Clippy is not try­ing to gen­er­ate ar­gu­ments for why it should do hu­man-nice things in or­der to make pa­per­clips; it is just neu­trally pur­su­ing pa­per­clips. So Clippy is go­ing to keep look­ing un­til it finds \(\pi_2.\)

For ex­am­ple:

• Your brain in­stinc­tively tries to per­suade this imag­i­nary Clippy to keep hu­mans around by ar­gu­ing, “If you keep us around as eco­nomic part­ners and trade with us, we can pro­duce pa­per­clips for you un­der Ri­cardo’s Law of Com­par­a­tive Ad­van­tage!” This is then the policy \(\pi_1\) which would in­deed pro­duce some pa­per­clips, but what would pro­duce even more pa­per­clips is the policy \(\pi_2\) of dis­assem­bling the hu­mans into spare atoms and re­plac­ing them with op­ti­mized pa­per­clip-pro­duc­ers.

• Your brain tries to per­suade an imag­i­nary Clippy by ar­gu­ing for policy \(\pi_1,\) “Hu­mans have a vast amount of varied life ex­pe­rience; you should keep us around and let us ac­cu­mu­late more ex­pe­rience, in case our life ex­pe­rience lets us make good sug­ges­tions!” This would pro­duce some ex­pected pa­per­clips, but what would pro­duce more pa­per­clips is policy \(\pi_2\) of “Disassem­ble all hu­man brains and store the in­for­ma­tion in an archive, then simu­late a much larger va­ri­ety of agents in a much larger va­ri­ety of cir­cum­stances so as to max­i­mize the pa­per­clip-rele­vant ob­ser­va­tions that could be made.”

An un­for­tu­nate fur­ther as­pect of this situ­a­tion is that, in cases like this, your brain may be tempted to go on ar­gu­ing for why re­ally \(\pi_2\) isn’t all that great and \(\pi_1\) is ac­tu­ally bet­ter, just like if your boss said “But maybe this com­pany will be even bet­ter off if I spend that money on com­puter equip­ment” and your brain at once started to con­vince it­self that com­put­ing equip­ment wasn’t all that great and higher salaries were much more im­por­tant for cor­po­rate pro­duc­tivity. (As Robert Trivers ob­served, de­cep­tion of oth­ers of­ten be­gins with de­cep­tion of self, and this fact is cen­tral to un­der­stand­ing why hu­mans evolved to think about poli­tics the way we did.)

But since you don’t get to see Clippy dis­card­ing your clever ar­gu­ments and just turn­ing ev­ery­thing in reach into pa­per­clips—at least, not yet—your brain might hold onto its clever and pos­si­bly self-de­cep­tive ar­gu­ment for why the thing you want is re­ally the thing that pro­duces the most pa­per­clips.

Pos­si­bly helpful men­tal pos­tures:

  • Con­tem­plate the max­i­mum num­ber of pa­per­clips you think an agent could get by mak­ing pa­per­clips the straight­for­ward way—just con­vert­ing all the galax­ies within reach into pa­per­clips. Okay, now does your nice policy \(\pi_1\) gen­er­ate more pa­per­clips than that? How is that even pos­si­ble?

  • Never mind there be­ing a “mind” pre­sent that you can “per­suade”. Sup­pose in­stead there’s just a time ma­chine that spits out some phys­i­cal out­puts, elec­tro­mag­netic pulses or what­ever, and the time ma­chine out­puts what­ever elec­tro­mag­netic pulses lead to the most fu­ture pa­per­clips. What does the time ma­chine do? Which out­puts lead to the most pa­per­clips as a strictly ma­te­rial fact?

  • Study evolu­tion­ary biol­ogy. Dur­ing the pre-1960s days of evolu­tion­ary biol­ogy, biol­o­gists would of­ten try to ar­gue for why nat­u­ral se­lec­tion would re­sult in hu­manly-nice re­sults, like an­i­mals con­trol­ling their own re­pro­duc­tion so as not to over­bur­den the en­vi­ron­ment. There’s a similar men­tal dis­ci­pline re­quired to not come up with clever ar­gu­ments for why nat­u­ral se­lec­tion would do hu­manly nice things.


  • Instrumental convergence

    Some strate­gies can help achieve most pos­si­ble sim­ple goals. E.g., ac­quiring more com­put­ing power or more ma­te­rial re­sources. By de­fault, un­less averted, we can ex­pect ad­vanced AIs to do that.