A ‘pa­per­clip’, in the con­text of AI al­ign­ment, is any con­figu­ra­tion of mat­ter which would seem bor­ing and val­ue­less even from a very cos­mopoli­tan per­spec­tive.

If some bizarre physics catas­tro­phe, spread­ing out at the speed of light, per­ma­nently trans­formed all mat­ter it touched into pa­per­clips, this would be morally equiv­a­lent to a physics catas­tro­phe that de­stroys the reach­able uni­verse out­right. There is no deep miss­ing moral in­sight we could have, no broad­en­ing of per­spec­tive and un­der­stand­ing, that would make us re­al­ize that lit­tle bent pieces of metal with­out any thought or in­ter­nal ex­pe­riences are the best pos­si­ble use of our cos­mic en­dow­ment. It’s true that we don’t know what epipha­nies may lie in the fu­ture for us, but that par­tic­u­lar pa­per­clip-epiphany seems im­prob­a­ble. If you are tempted to ar­gue with this state­ment as it ap­plies to ac­tual non-metaphor­i­cal pa­per­clips, you are prob­a­bly be­ing overly con­trary. This is why we con­sider ac­tual non-metaphor­i­cal pa­per­clips as the case in point of ‘pa­per­clips’.

From our per­spec­tive, any en­tity that did in fact go around trans­form­ing al­most all reach­able mat­ter into literal ac­tual non­metaphor­i­cal pa­per­clips, would be do­ing some­thing in­cred­ibly pointless; there would al­most cer­tainly be no hid­den wis­dom in the act that we could per­ceive on deeper ex­am­i­na­tion or fur­ther growth of our own in­tel­lec­tual ca­pac­i­ties. By the defi­ni­tion of the con­cept, this would be equally true of any­thing more gen­er­ally termed a pa­per­clip max­i­mizer. Any­thing claimed about ‘pa­per­clips’ or a ‘pa­per­clip’ max­i­mizer (such as the claim that such an en­tity can ex­ist with­out hav­ing any spe­cial in­tel­lec­tual defects) must go through with­out any change for ac­tual pa­per­clips. Ac­tual pa­per­clips are meant to be a cen­tral ex­am­ple of ‘pa­per­clips’.

The only dis­tinc­tion be­tween pa­per­clips and ‘pa­per­clips’ is that the cat­e­gory ‘pa­per­clips’ is far wider than the cat­e­gory ‘ac­tual non-metaphor­i­cal pa­per­clips’ and in­cludes many more spe­cific con­figu­ra­tions of mat­ter. Pen­cil erasers, tiny molec­u­lar smiley­faces, and enor­mous di­a­mond masses are all ‘pa­per­clips’. Even un­der the Orthog­o­nal­ity Th­e­sis, an AI max­i­miz­ing ac­tual non-metaphor­i­cal pa­per­clips would be an im­prob­a­ble ac­tual out­come of screw­ing up on value al­ign­ment; but only be­cause there are so many other pos­si­bil­ities. A ‘red ac­tual-pa­per­clip max­i­mizer’ would be even more im­prob­a­ble than an ac­tual-pa­per­clip max­i­mizer to find in real life; but this is not be­cause red­ness is an­ti­thet­i­cal to the na­ture of in­tel­li­gent goals. The ‘red­ness’ clause is one more added piece of com­plex­ity in the speci­fi­ca­tion that drives down the prob­a­bil­ity of that ex­act out­come.

The pop­u­lar press has some­times dis­torted the no­tion of a pa­per­clip max­i­mizer into a story about an AI run­ning a pa­per­clip fac­tory that takes over the uni­verse. (Need­less to say, the kind of AI used in a pa­per­clip-man­u­fac­tur­ing fa­cil­ity is un­likely to be a fron­tier re­search AI.) The con­cept of a ‘pa­per­clip’ is not that it’s an ex­plicit goal some­body fool­ishly gave an AI, or even a goal com­pre­hen­si­ble in hu­man terms at all. To imag­ine a cen­tral ex­am­ple of a sup­posed pa­per­clip max­i­mizer, imag­ine a re­search-level AI that did not sta­bly pre­serve what its mak­ers thought was sup­posed to be its util­ity func­tion, or an AI with a poorly speci­fied value learn­ing rule, etcetera; such that the con­figu­ra­tion of mat­ter that ac­tu­ally hap­pened to max out the AI’s util­ity func­tion looks like a tiny string of atoms in the shape of a pa­per­clip.


  • Paperclip maximizer

    This agent will not stop un­til the en­tire uni­verse is filled with pa­per­clips.