The term ‘pivotal’ in the context of value alignment theory is a guarded term to refer to events, particularly the development of sufficiently advanced AIs, that will make a large difference a billion years later. A ‘pivotal’ event upsets the current gameboard—decisively settles a win or loss, or drastically changes the probability of win or loss, or changes the future conditions under which a win or loss is determined. A ‘pivotal achievement’ is one that does this in a positive direction, and a ‘pivotal catastrophe’ upsets the gameboard in a negative direction. These may also be referred to as ‘astronomical achievements’ or ‘astronomical catastrophes’.
Reason for guardedness
Guarded definitions are deployed where there is reason to suspect that a concept will otherwise be over-extended. The case for having a guarded definition of ‘pivotal event’ is that, after it’s been shown that event X is maybe not as important as originally thought, one side of that debate may be strongly tempted to go on arguing that, wait, really it could be “relevant” (by some strained line of possibility).
Example 1: In the central example of the ZF provability Oracle, considering a series of possible ways that an untrusted Oracle could break an attempt to Box it, we end with an extremely Boxed Oracle that can only output machine-checkable proofs of predefined theorems in Zermelo-Fraenkel set theory, with the proofs themselves being thrown away once machine-verified. We then observe that we don’t currently know of any obvious way to save the world by finding out that particular, pre-chosen theorems are provable. It may then be tempting to argue that this device could greatly advance the field of mathematics, and that math is relevant to the value alignment problem. However, at least given that particular proposal for using the ZF Oracle, the basic rules of the AI-development playing field would remain the same, the value alignment problem would not be finished nor would it have moved on to a new phase, the world would still be in danger (neither safe nor destroyed), etcetera. (This doesn’t rule out that tomorrow some reader will think of some spectacularly clever use for a ZF Oracle that does upset the chessboard and get us on a direct path to winning where we know what we need to do from there—and in this case MIRI would reclassify the ZF Oracle as a high-priority research avenue!)
Example 2: Suppose a funder, worried about the prospect of advanced AIs wiping out humanity, offers grants for “AI safety”. Then compared to the much more difficult problems involved with making something actually smarter than you be safe, it may be tempting to try to write papers that you know you can finish, like a paper on robotic cars causing unemployment in the trucking industry, or a paper on who holds legal liability when a factory machine crushes a worker. But while it’s true that crushed factory workers and unemployed truckers are both, ceteris paribus, bad, they are not astronomical catastrophes that transform all galaxies inside our future light cone into paperclips, and the latter category seems worth distinguishing. This definition needs to be guarded because there will then be a temptation for the grantseeker to argue, “Well, if AI causes unemployment, that could slow world economic growth, which will make countries more hostile to each other, which would make it harder to prevent an AI arms race.” But the possibility of something ending up having a non-zero impact on astronomical stakes is not the same concept as events that have a game-changing impact on astronomical stakes. The question is what are the largest lowest-hanging fruit in astronomical stakes, not whether something can be argued as defensible by pointing to a non-zero astronomical impact.
Example 3: Suppose a behaviorist genie is restricted from modeling human minds in any great detail, but is still able to build and deploy molecular nanotechnology. Moreover, the AI is able to understand the instruction, “Build a device for scanning human brains and running them at high speed with minimum simulation error”, and work out a way to do this without simulating whole human brains as test cases. The genie is then used to upload a set of, say, fifty human researchers, and run them at 10,000-to-1 speeds. This accomplishment would not of itself save the world or destroy it—the researchers inside the simulation would still need to solve the value alignment problem, and might not succeed in doing so. But it would upset the gameboard and change the major determinants of winning, compared to the default scenario where the fifty researchers are in an equal-speed arms race with the rest of the world, and don’t have unlimited time to check their work. The event where the genie was used to upload the researchers and run them at high speeds would be a critical event, a hinge where the optimum strategy was drastically different before versus after that pivotal moment.
Example 4: Suppose a paperclip maximizer is built, self-improves, and converts everything in its future light cone into paperclips. The fate of the universe is then settled, so building the paperclip maximizer was a pivotal catastrophe.
Example 5: A mass simultaneous malfunction of robotic cars causes them to deliberately run over pedestrians in many cases. Humanity buries its dead, picks itself up, and moves on. This was not a pivotal catastrophe, even though it may have nonzero influence on future AI development.
A strained argument for event X being a pivotal achievement often goes through X being an input into a large pool of goodness that also has many other inputs. A ZF provability Oracle would advance mathematics and mathematics is good for value alignment, but there’s nothing obvious about a ZF Oracle that’s specialized for advancing value alignment work, compared to many other inputs into total mathematical progress. Handling trucker disemployment would only be one factor among many in world economic growth.
By contrast, a genie that uploaded human researchers putatively would not be producing merely one upload among many; it would be producing the only uploads where the default was otherwise no uploads. In turn, these uploads could do decades or centuries of unrushed serial research on the value alignment problem, where the alternative was rushed research over much shorter timespans; and this can plausibly make the difference by itself between an AI that achieves ~100% of value versus an AI that achieves ~0% of value. At the end of the extrapolation where we ask what difference everything is supposed to make, we find a series of direct impacts producing events qualitatively different from the default, ending in a huge percentage difference in how much of all possible value gets achieved.
By having a narrow and guarded definition of ‘pivotal events’, we can avoid bait-and-switch arguments for the importance of research proposals, where the ‘bait’ is raising the apparent importance of ‘AI safety’ by discussing things with large direct impacts on astronomical stakes (like a paperclip maximizer or Friendly sovereign) and the ‘switch’ is to working on problems of dubious astronomical impact that are inputs into large pools with many other inputs.
‘Dealing a deck of cards’ metaphor
There’s a line of reasoning that goes, “But most consumers don’t want general AIs, they want voice-operated assistants. So companies will develop voice-operated assistants, not general AIs.” But voice-operated assistants are themselves not pivotal events; developing them doesn’t prevent general AIs from being developed later. So even though this non-pivotal event precedes a pivotal one, it doesn’t mean we should focus on the earlier event instead.
No matter how many non-game-changing ‘AIs’ are developed, whether playing great chess or operating in the stock market or whatever, the underlying research process will keep churning and keep turning out other and more powerful AIs.
Imagine a deck of cards which has some aces (superintelligences) and many more non-aces. We keep dealing through the deck until we get a black ace, a red ace, or some other card that stops the deck from dealing any further. A non-ace Joker card that permanently prevents any aces from being drawn would be pivotal (not necessarily good, but definitely pivotal). A card that shifts the further distribution of the deck from 10% red aces to 90% red aces would be pivotal; we could see this as a metaphor for the hoped-for result of Example 3 (uploading the researchers), even though the game is not then stopped and assigned a score. A card that causes the deck to be dealt 1% slower, 1% faster, eliminates a non-ace card, adds a non-ace card, changes the proportion of red non-ace cards, etcetera, would not be pivotal. A card that raises the probability of a red ace from 50% to 51% would be highly desirable, but not pivotal—it would not qualitatively change the nature of the game.
Giving examples of non-pivotal events that could precede or be easier to accomplish than pivotal events doesn’t change the nature of the game where we keep dealing until we get a black ace or red ace.
Examples of pivotal and non-pivotal events
non-value-aligned AI is built, takes over universe
human intelligence enhancement powerful enough that the best enhanced humans are qualitatively and significantly smarter than the smartest non-enhanced humans
a limited Task AGI that can:
upload humans and run them at speeds more comparable to those of an AI
prevent the origin of all hostile superintelligences (in the nice case, only temporarily and via strategies that cause only acceptable amounts of collateral damage)
design or deploy nanotechnology such that there exists a direct route to the operators being able to do one of the other items on this list (human intelligence enhancement, prevent emergence of hostile SIs, etc.)
a complete and detailed synaptic-vesicle-level scan of a human brain results in cracking the cortical and cerebellar algorithms, which rapidly leads to non-value-aligned neuromorphic AI
curing cancer (good for you, but it didn’t resolve the value alignment problem)
proving the Riemann Hypothesis (ditto)
an extremely expensive way to augment human intelligence by the equivalent of 5 IQ points that doesn’t work reliably on people who are already very smart
making a billion dollars on the stock market
robotic cars devalue the human capital of professional drivers, and mismanagement of aggregate demand by central banks plus burdensome labor market regulations is an obstacle to their re-employment
unified world government with powerful monitoring regime for ‘dangerous’ technologies
widely used gene therapy that brought anyone up to a minimum equivalent IQ of 120
Centrality to limited AI proposals
We can view the general problem of Limited AI as having the central question: What is a pivotal positive accomplishment, such that an AI which does that thing and not some other things is therefore a whole lot safer to build? This is not a trivial question because it turns out that most interesting things require general cognitive capabilities, and most interesting goals can require arbitrarily complicated value identification problems to pursue safely.
It’s trivial to create an “AI” which is absolutely safe and can’t be used for any pivotal achievements. E.g. Google Maps, or a rock with “2 + 2 = 4″ painted on it.
(For arguments that Google Maps could potentially help researchers drive to work faster or that a rock could potentially be used to bash in the chassis of a hostile superintelligence, see the pages on guarded definitions and strained arguments.)
Centrality to concept of ‘advanced agent’
We can view the notion of an advanced agent as “agent with enough cognitive capacity to cause a pivotal event, positive or negative”; the advanced agent properties are either those properties that might lead up to participation in a pivotal event, or properties that might play a critical role in determining the AI’s trajectory and hence how the pivotal event turns out.
Policy of focusing effort on causing pivotal positive events or preventing pivotal negative events
Obvious utilitarian argument: doing something with a big positive impact is better than doing something with a small positive impact.
In the larger context ofand , the issue is a bit more complicated. Reasoning from says that there will often be barriers (conceptual or otherwise) to the highest-return investments. When we find that hugely important things seem relatively neglected and hence promising of high marginal returns if solved, this is often because there’s some conceptual barrier to running ahead and doing them.
For example: to tackle the hardest problems is often much scarier (you’re not sure if you can make any progress on describing a self-modifying agent that provably has a stable goal system) than ‘bouncing off’ to some easier, more comprehensible problem (like writing a paper about the impact of robotic cars on unemployment, where you’re very sure you can in fact write a paper like that at the time you write the grant proposal).
The obvious counterargument is that perhaps you can’t make progress on your problem of self-modifying agents, perhaps it’s too hard. But from this it doesn’t follow that the robotic-cars paper is what we should be doing instead—the robotic cars paper only makes sense if there are no neglected tractable investments that have bigger relative marginal inputs into more pivotal events.
If there are in fact some neglected tractable investments in directly pivotal events, then we can expect a search for pivotal events to turn up superior places to invest effort. But a failure mode of this search is if we fail to cognitively guard the concept of ‘pivotal event’. In particular, if we’re allowed to have indirect arguments for ‘relevance’ that go through big common pools of goodness like ‘friendliness of nations toward each other’, then the pool of interventions inside that concept is so large that it will start to include things that are optimized for appeal under more usual metrics, e.g. papers that don’t seem unnerving and that somebody knows they can write. So if there’s no guarded concept of research on ‘pivotal’ things, we will end up with very standard research being done, the sort that would otherwise be done by academia anyway, and our investment will end up having a low expected marginal impact on the final outcome.
This sort of qualitative reasoning about what is or isn’t ‘pivotal’ wouldn’t be necessary if we could put solid numbers on the impact of each intervention on the probable achivement of astronomical goods. But that is an unlikely ‘if’. Thus, there’s some cause to reason qualitatively about what is or isn’t ‘pivotal’, as opposed to just calculating out the numbers, when we’re trying to pursue.
- Value achievement dilemma
How can Earth-originating intelligent life achieve most of its potential value, whether by AI or otherwise?