You can't get more paperclips that way

In­stru­men­tal con­ver­gence says that var­i­ous prop­er­ties $$P$$ of an agent, of­ten scary or detri­men­tal-by-de­fault prop­er­ties like “try­ing to gain con­trol of lots of re­sources” or “de­ceiv­ing hu­mans into think­ing you are nice”, will fall out of pur­su­ing most util­ity func­tions $$U.$$ You might be tempted to hope that nice or re­as­sur­ing prop­er­ties $$P$$ would also fall out of most util­ity func­tions $$U$$ in the same nat­u­ral way. In fact, your brain might tempted to treat Clippy the Paper­clip Max­i­mizer as a poli­ti­cal agent you were try­ing to clev­erly per­suade, and come up with clever ar­gu­ments for why Clippy should do things your way in or­der to get more pa­per­clips, like try­ing to per­suade your boss why you ought to get a raise for the good of the com­pany.

The prob­lem here is that:

• Gen­er­ally, when you think of a nice policy $$\pi_1$$ that pro­duces some pa­per­clips, there will be a non-nice policy $$\pi_2$$ that pro­duces even more pa­per­clips.

• Clippy is not try­ing to gen­er­ate ar­gu­ments for why it should do hu­man-nice things in or­der to make pa­per­clips; it is just neu­trally pur­su­ing pa­per­clips. So Clippy is go­ing to keep look­ing un­til it finds $$\pi_2.$$

For ex­am­ple:

• Your brain in­stinc­tively tries to per­suade this imag­i­nary Clippy to keep hu­mans around by ar­gu­ing, “If you keep us around as eco­nomic part­ners and trade with us, we can pro­duce pa­per­clips for you un­der Ri­cardo’s Law of Com­par­a­tive Ad­van­tage!” This is then the policy $$\pi_1$$ which would in­deed pro­duce some pa­per­clips, but what would pro­duce even more pa­per­clips is the policy $$\pi_2$$ of dis­assem­bling the hu­mans into spare atoms and re­plac­ing them with op­ti­mized pa­per­clip-pro­duc­ers.

• Your brain tries to per­suade an imag­i­nary Clippy by ar­gu­ing for policy $$\pi_1,$$ “Hu­mans have a vast amount of varied life ex­pe­rience; you should keep us around and let us ac­cu­mu­late more ex­pe­rience, in case our life ex­pe­rience lets us make good sug­ges­tions!” This would pro­duce some ex­pected pa­per­clips, but what would pro­duce more pa­per­clips is policy $$\pi_2$$ of “Disassem­ble all hu­man brains and store the in­for­ma­tion in an archive, then simu­late a much larger va­ri­ety of agents in a much larger va­ri­ety of cir­cum­stances so as to max­i­mize the pa­per­clip-rele­vant ob­ser­va­tions that could be made.”

An un­for­tu­nate fur­ther as­pect of this situ­a­tion is that, in cases like this, your brain may be tempted to go on ar­gu­ing for why re­ally $$\pi_2$$ isn’t all that great and $$\pi_1$$ is ac­tu­ally bet­ter, just like if your boss said “But maybe this com­pany will be even bet­ter off if I spend that money on com­puter equip­ment” and your brain at once started to con­vince it­self that com­put­ing equip­ment wasn’t all that great and higher salaries were much more im­por­tant for cor­po­rate pro­duc­tivity. (As Robert Trivers ob­served, de­cep­tion of oth­ers of­ten be­gins with de­cep­tion of self, and this fact is cen­tral to un­der­stand­ing why hu­mans evolved to think about poli­tics the way we did.)

But since you don’t get to see Clippy dis­card­ing your clever ar­gu­ments and just turn­ing ev­ery­thing in reach into pa­per­clips—at least, not yet—your brain might hold onto its clever and pos­si­bly self-de­cep­tive ar­gu­ment for why the thing you want is re­ally the thing that pro­duces the most pa­per­clips.

• Con­tem­plate the max­i­mum num­ber of pa­per­clips you think an agent could get by mak­ing pa­per­clips the straight­for­ward way—just con­vert­ing all the galax­ies within reach into pa­per­clips. Okay, now does your nice policy $$\pi_1$$ gen­er­ate more pa­per­clips than that? How is that even pos­si­ble?