A ‘paperclip’, in the context of AI alignment, is any configuration of matter which would seem boring and valueless even from a very cosmopolitan perspective.

If some bizarre physics catastrophe, spreading out at the speed of light, permanently transformed all matter it touched into paperclips, this would be morally equivalent to a physics catastrophe that destroys the reachable universe outright. There is no deep missing moral insight we could have, no broadening of perspective and understanding, that would make us realize that little bent pieces of metal without any thought or internal experiences are the best possible use of our cosmic endowment. It’s true that we don’t know what epiphanies may lie in the future for us, but that particular paperclip-epiphany seems improbable. If you are tempted to argue with this statement as it applies to actual non-metaphorical paperclips, you are probably being overly contrary. This is why we consider actual non-metaphorical paperclips as the case in point of ‘paperclips’.

From our perspective, any entity that did in fact go around transforming almost all reachable matter into literal actual nonmetaphorical paperclips, would be doing something incredibly pointless; there would almost certainly be no hidden wisdom in the act that we could perceive on deeper examination or further growth of our own intellectual capacities. By the definition of the concept, this would be equally true of anything more generally termed a paperclip maximizer. Anything claimed about ‘paperclips’ or a ‘paperclip’ maximizer (such as the claim that such an entity can exist without having any special intellectual defects) must go through without any change for actual paperclips. Actual paperclips are meant to be a central example of ‘paperclips’.

The only distinction between paperclips and ‘paperclips’ is that the category ‘paperclips’ is far wider than the category ‘actual non-metaphorical paperclips’ and includes many more specific configurations of matter. Pencil erasers, tiny molecular smileyfaces, and enormous diamond masses are all ‘paperclips’. Even under the Orthogonality Thesis, an AI maximizing actual non-metaphorical paperclips would be an improbable actual outcome of screwing up on value alignment; but only because there are so many other possibilities. A ‘red actual-paperclip maximizer’ would be even more improbable than an actual-paperclip maximizer to find in real life; but this is not because redness is antithetical to the nature of intelligent goals. The ‘redness’ clause is one more added piece of complexity in the specification that drives down the probability of that exact outcome.

The popular press has sometimes distorted the notion of a paperclip maximizer into a story about an AI running a paperclip factory that takes over the universe. (Needless to say, the kind of AI used in a paperclip-manufacturing facility is unlikely to be a frontier research AI.) The concept of a ‘paperclip’ is not that it’s an explicit goal somebody foolishly gave an AI, or even a goal comprehensible in human terms at all. To imagine a central example of a supposed paperclip maximizer, imagine a research-level AI that did not stably preserve what its makers thought was supposed to be its utility function, or an AI with a poorly specified value learning rule, etcetera; such that the configuration of matter that actually happened to max out the AI’s utility function looks like a tiny string of atoms in the shape of a paperclip.


  • Paperclip maximizer

    This agent will not stop until the entire universe is filled with paperclips.