Paperclip maximizer

An ex­pected pa­per­clip max­i­mizer is an agent that out­puts the ac­tion it be­lieves will lead to the great­est num­ber of pa­per­clips ex­ist­ing. Or in more de­tail, its util­ity func­tion is lin­ear in the num­ber of pa­per­clips times the num­ber of sec­onds that each pa­per­clip lasts, over the life­time of the uni­verse. See http://​​wiki.less­​​wiki/​​Paper­clip_max­i­mizer.

The agent may be a bounded max­i­mizer rather than an ob­jec­tive max­i­mizer with­out chang­ing the key ideas; the core premise is just that, given ac­tions A and B where the pa­per­clip max­i­mizer has eval­u­ated the con­se­quences of both ac­tions, the pa­per­clip max­i­mizer always prefers the ac­tion that it ex­pects to lead to more pa­per­clips.

Some key ideas that the no­tion of an ex­pected pa­per­clip max­i­mizer illus­trates:

  • A self-mod­ify­ing pa­per­clip max­i­mizer does not change its own util­ity func­tion to some­thing other than ‘pa­per­clips’, since this would be ex­pected to lead to fewer pa­per­clips ex­ist­ing.

  • A pa­per­clip max­i­mizer in­stru­men­tally prefers the stan­dard con­ver­gent in­stru­men­tal strate­gies—it will seek ac­cess to mat­ter, en­ergy, and ne­gen­tropy in or­der to make pa­per­clips; try to build effi­cient tech­nol­ogy for coloniz­ing the galax­ies to trans­form into pa­per­clips; do what­ever sci­ence is nec­es­sary to gain the knowl­edge to build such tech­nol­ogy op­ti­mally; etcetera.

  • “The AI does not hate you, nor does it love you, and you are made of atoms it can use for some­thing else.”


  • Paperclip

    A con­figu­ra­tion of mat­ter that we’d see as be­ing worth­less even from a very cos­mopoli­tan per­spec­tive.

  • Random utility function

    A ‘ran­dom’ util­ity func­tion is one cho­sen at ran­dom ac­cord­ing to some sim­ple prob­a­bil­ity mea­sure (e.g. weight by Kol­moro­gov com­plex­ity) on a log­i­cal space of for­mal util­ity func­tions.


  • Instrumental convergence

    Some strate­gies can help achieve most pos­si­ble sim­ple goals. E.g., ac­quiring more com­put­ing power or more ma­te­rial re­sources. By de­fault, un­less averted, we can ex­pect ad­vanced AIs to do that.

  • Orthogonality Thesis

    Will smart AIs au­to­mat­i­cally be­come benev­olent, or au­to­mat­i­cally be­come hos­tile? Or do differ­ent AI de­signs im­ply differ­ent goals?