Paperclip maximizer

An expected paperclip maximizer is an agent that outputs the action it believes will lead to the greatest number of paperclips existing. Or in more detail, its utility function is linear in the number of paperclips times the number of seconds that each paperclip lasts, over the lifetime of the universe. See http://​​​​wiki/​​Paperclip_maximizer.

The agent may be a bounded maximizer rather than an objective maximizer without changing the key ideas; the core premise is just that, given actions A and B where the paperclip maximizer has evaluated the consequences of both actions, the paperclip maximizer always prefers the action that it expects to lead to more paperclips.

Some key ideas that the notion of an expected paperclip maximizer illustrates:

  • A self-modifying paperclip maximizer does not change its own utility function to something other than ‘paperclips’, since this would be expected to lead to fewer paperclips existing.

  • A paperclip maximizer instrumentally prefers the standard convergent instrumental strategies—it will seek access to matter, energy, and negentropy in order to make paperclips; try to build efficient technology for colonizing the galaxies to transform into paperclips; do whatever science is necessary to gain the knowledge to build such technology optimally; etcetera.

  • “The AI does not hate you, nor does it love you, and you are made of atoms it can use for something else.”


  • Paperclip

    A configuration of matter that we’d see as being worthless even from a very cosmopolitan perspective.

  • Random utility function

    A ‘random’ utility function is one chosen at random according to some simple probability measure (e.g. weight by Kolmorogov complexity) on a logical space of formal utility functions.


  • Instrumental convergence

    Some strategies can help achieve most possible simple goals. E.g., acquiring more computing power or more material resources. By default, unless averted, we can expect advanced AIs to do that.

  • Orthogonality Thesis

    Will smart AIs automatically become benevolent, or automatically become hostile? Or do different AI designs imply different goals?