Pivotal event

The term ‘pivotal’ in the con­text of value al­ign­ment the­ory is a guarded term to re­fer to events, par­tic­u­larly the de­vel­op­ment of suffi­ciently ad­vanced AIs, that will make a large differ­ence a billion years later. A ‘pivotal’ event up­sets the cur­rent game­board—de­ci­sively set­tles a win or loss, or dras­ti­cally changes the prob­a­bil­ity of win or loss, or changes the fu­ture con­di­tions un­der which a win or loss is de­ter­mined. A ‘pivotal achieve­ment’ is one that does this in a pos­i­tive di­rec­tion, and a ‘pivotal catas­tro­phe’ up­sets the game­board in a nega­tive di­rec­tion. Th­ese may also be referred to as ‘as­tro­nom­i­cal achieve­ments’ or ‘as­tro­nom­i­cal catas­tro­phes’.

Rea­son for guardedness

Guarded defi­ni­tions are de­ployed where there is rea­son to sus­pect that a con­cept will oth­er­wise be over-ex­tended. The case for hav­ing a guarded defi­ni­tion of ‘pivotal event’ is that, af­ter it’s been shown that event X is maybe not as im­por­tant as origi­nally thought, one side of that de­bate may be strongly tempted to go on ar­gu­ing that, wait, re­ally it could be “rele­vant” (by some strained line of pos­si­bil­ity).

Ex­am­ple 1: In the cen­tral ex­am­ple of the ZF prov­abil­ity Or­a­cle, con­sid­er­ing a se­ries of pos­si­ble ways that an un­trusted Or­a­cle could break an at­tempt to Box it, we end with an ex­tremely Boxed Or­a­cle that can only out­put ma­chine-check­able proofs of pre­defined the­o­rems in Zer­melo-Fraenkel set the­ory, with the proofs them­selves be­ing thrown away once ma­chine-ver­ified. We then ob­serve that we don’t cur­rently know of any ob­vi­ous way to save the world by find­ing out that par­tic­u­lar, pre-cho­sen the­o­rems are prov­able. It may then be tempt­ing to ar­gue that this de­vice could greatly ad­vance the field of math­e­mat­ics, and that math is rele­vant to the value al­ign­ment prob­lem. How­ever, at least given that par­tic­u­lar pro­posal for us­ing the ZF Or­a­cle, the ba­sic rules of the AI-de­vel­op­ment play­ing field would re­main the same, the value al­ign­ment prob­lem would not be finished nor would it have moved on to a new phase, the world would still be in dan­ger (nei­ther safe nor de­stroyed), etcetera. (This doesn’t rule out that to­mor­row some reader will think of some spec­tac­u­larly clever use for a ZF Or­a­cle that does up­set the chess­board and get us on a di­rect path to win­ning where we know what we need to do from there—and in this case MIRI would re­clas­sify the ZF Or­a­cle as a high-pri­or­ity re­search av­enue!)

Ex­am­ple 2: Sup­pose a fun­der, wor­ried about the prospect of ad­vanced AIs wiping out hu­man­ity, offers grants for “AI safety”. Then com­pared to the much more difficult prob­lems in­volved with mak­ing some­thing ac­tu­ally smarter than you be safe, it may be tempt­ing to try to write pa­pers that you know you can finish, like a pa­per on robotic cars caus­ing un­em­ploy­ment in the truck­ing in­dus­try, or a pa­per on who holds le­gal li­a­bil­ity when a fac­tory ma­chine crushes a worker. But while it’s true that crushed fac­tory work­ers and un­em­ployed truck­ers are both, ce­teris paribus, bad, they are not as­tro­nom­i­cal catas­tro­phes that trans­form all galax­ies in­side our fu­ture light cone into pa­per­clips, and the lat­ter cat­e­gory seems worth dis­t­in­guish­ing. This defi­ni­tion needs to be guarded be­cause there will then be a temp­ta­tion for the grantseeker to ar­gue, “Well, if AI causes un­em­ploy­ment, that could slow world eco­nomic growth, which will make coun­tries more hos­tile to each other, which would make it harder to pre­vent an AI arms race.” But the pos­si­bil­ity of some­thing end­ing up hav­ing a non-zero im­pact on as­tro­nom­i­cal stakes is not the same con­cept as events that have a game-chang­ing im­pact on as­tro­nom­i­cal stakes. The ques­tion is what are the largest low­est-hang­ing fruit in as­tro­nom­i­cal stakes, not whether some­thing can be ar­gued as defen­si­ble by point­ing to a non-zero as­tro­nom­i­cal im­pact.

Ex­am­ple 3: Sup­pose a be­hav­iorist ge­nie is re­stricted from mod­el­ing hu­man minds in any great de­tail, but is still able to build and de­ploy molec­u­lar nan­otech­nol­ogy. More­over, the AI is able to un­der­stand the in­struc­tion, “Build a de­vice for scan­ning hu­man brains and run­ning them at high speed with min­i­mum simu­la­tion er­ror”, and work out a way to do this with­out simu­lat­ing whole hu­man brains as test cases. The ge­nie is then used to up­load a set of, say, fifty hu­man re­searchers, and run them at 10,000-to-1 speeds. This ac­com­plish­ment would not of it­self save the world or de­stroy it—the re­searchers in­side the simu­la­tion would still need to solve the value al­ign­ment prob­lem, and might not suc­ceed in do­ing so. But it would up­set the game­board and change the ma­jor de­ter­mi­nants of win­ning, com­pared to the de­fault sce­nario where the fifty re­searchers are in an equal-speed arms race with the rest of the world, and don’t have un­limited time to check their work. The event where the ge­nie was used to up­load the re­searchers and run them at high speeds would be a crit­i­cal event, a hinge where the op­ti­mum strat­egy was dras­ti­cally differ­ent be­fore ver­sus af­ter that pivotal mo­ment.

Ex­am­ple 4: Sup­pose a pa­per­clip max­i­mizer is built, self-im­proves, and con­verts ev­ery­thing in its fu­ture light cone into pa­per­clips. The fate of the uni­verse is then set­tled, so build­ing the pa­per­clip max­i­mizer was a pivotal catas­tro­phe.

Ex­am­ple 5: A mass si­mul­ta­neous malfunc­tion of robotic cars causes them to de­liber­ately run over pedes­tri­ans in many cases. Hu­man­ity buries its dead, picks it­self up, and moves on. This was not a pivotal catas­tro­phe, even though it may have nonzero in­fluence on fu­ture AI de­vel­op­ment.

A strained ar­gu­ment for event X be­ing a pivotal achieve­ment of­ten goes through X be­ing an in­put into a large pool of good­ness that also has many other in­puts. A ZF prov­abil­ity Or­a­cle would ad­vance math­e­mat­ics and math­e­mat­ics is good for value al­ign­ment, but there’s noth­ing ob­vi­ous about a ZF Or­a­cle that’s spe­cial­ized for ad­vanc­ing value al­ign­ment work, com­pared to many other in­puts into to­tal math­e­mat­i­cal progress. Han­dling trucker dis­em­ploy­ment would only be one fac­tor among many in world eco­nomic growth.

By con­trast, a ge­nie that up­loaded hu­man re­searchers pu­ta­tively would not be pro­duc­ing merely one up­load among many; it would be pro­duc­ing the only up­loads where the de­fault was oth­er­wise no up­loads. In turn, these up­loads could do decades or cen­turies of un­rushed se­rial re­search on the value al­ign­ment prob­lem, where the al­ter­na­tive was rushed re­search over much shorter times­pans; and this can plau­si­bly make the differ­ence by it­self be­tween an AI that achieves ~100% of value ver­sus an AI that achieves ~0% of value. At the end of the ex­trap­o­la­tion where we ask what differ­ence ev­ery­thing is sup­posed to make, we find a se­ries of di­rect im­pacts pro­duc­ing events qual­i­ta­tively differ­ent from the de­fault, end­ing in a huge per­centage differ­ence in how much of all pos­si­ble value gets achieved.

By hav­ing a nar­row and guarded defi­ni­tion of ‘pivotal events’, we can avoid bait-and-switch ar­gu­ments for the im­por­tance of re­search pro­pos­als, where the ‘bait’ is rais­ing the ap­par­ent im­por­tance of ‘AI safety’ by dis­cussing things with large di­rect im­pacts on as­tro­nom­i­cal stakes (like a pa­per­clip max­i­mizer or Friendly sovereign) and the ‘switch’ is to work­ing on prob­lems of du­bi­ous as­tro­nom­i­cal im­pact that are in­puts into large pools with many other in­puts.

‘Deal­ing a deck of cards’ metaphor

There’s a line of rea­son­ing that goes, “But most con­sumers don’t want gen­eral AIs, they want voice-op­er­ated as­sis­tants. So com­pa­nies will de­velop voice-op­er­ated as­sis­tants, not gen­eral AIs.” But voice-op­er­ated as­sis­tants are them­selves not pivotal events; de­vel­op­ing them doesn’t pre­vent gen­eral AIs from be­ing de­vel­oped later. So even though this non-pivotal event pre­cedes a pivotal one, it doesn’t mean we should fo­cus on the ear­lier event in­stead.

No mat­ter how many non-game-chang­ing ‘AIs’ are de­vel­oped, whether play­ing great chess or op­er­at­ing in the stock mar­ket or what­ever, the un­der­ly­ing re­search pro­cess will keep churn­ing and keep turn­ing out other and more pow­er­ful AIs.

Imag­ine a deck of cards which has some aces (su­per­in­tel­li­gences) and many more non-aces. We keep deal­ing through the deck un­til we get a black ace, a red ace, or some other card that stops the deck from deal­ing any fur­ther. A non-ace Joker card that per­ma­nently pre­vents any aces from be­ing drawn would be pivotal (not nec­es­sar­ily good, but definitely pivotal). A card that shifts the fur­ther dis­tri­bu­tion of the deck from 10% red aces to 90% red aces would be pivotal; we could see this as a metaphor for the hoped-for re­sult of Ex­am­ple 3 (up­load­ing the re­searchers), even though the game is not then stopped and as­signed a score. A card that causes the deck to be dealt 1% slower, 1% faster, elimi­nates a non-ace card, adds a non-ace card, changes the pro­por­tion of red non-ace cards, etcetera, would not be pivotal. A card that raises the prob­a­bil­ity of a red ace from 50% to 51% would be highly de­sir­able, but not pivotal—it would not qual­i­ta­tively change the na­ture of the game.

Giv­ing ex­am­ples of non-pivotal events that could pre­cede or be eas­ier to ac­com­plish than pivotal events doesn’t change the na­ture of the game where we keep deal­ing un­til we get a black ace or red ace.

Ex­am­ples of pivotal and non-pivotal events

Pivotal events:

  • non-value-al­igned AI is built, takes over universe

  • hu­man in­tel­li­gence en­hance­ment pow­er­ful enough that the best en­hanced hu­mans are qual­i­ta­tively and sig­nifi­cantly smarter than the smartest non-en­hanced humans

  • a limited Task AGI that can:

  • up­load hu­mans and run them at speeds more com­pa­rable to those of an AI

  • pre­vent the ori­gin of all hos­tile su­per­in­tel­li­gences (in the nice case, only tem­porar­ily and via strate­gies that cause only ac­cept­able amounts of col­lat­eral dam­age)

  • de­sign or de­ploy nan­otech­nol­ogy such that there ex­ists a di­rect route to the op­er­a­tors be­ing able to do one of the other items on this list (hu­man in­tel­li­gence en­hance­ment, pre­vent emer­gence of hos­tile SIs, etc.)

  • a com­plete and de­tailed synap­tic-vesi­cle-level scan of a hu­man brain re­sults in crack­ing the cor­ti­cal and cere­bel­lar al­gorithms, which rapidly leads to non-value-al­igned neu­ro­mor­phic AI

Non-pivotal events:

  • cur­ing can­cer (good for you, but it didn’t re­solve the value al­ign­ment prob­lem)

  • prov­ing the Rie­mann Hy­poth­e­sis (ditto)

  • an ex­tremely ex­pen­sive way to aug­ment hu­man in­tel­li­gence by the equiv­a­lent of 5 IQ points that doesn’t work re­li­ably on peo­ple who are already very smart

  • mak­ing a billion dol­lars on the stock market

  • robotic cars de­value the hu­man cap­i­tal of pro­fes­sional drivers, and mis­man­age­ment of ag­gre­gate de­mand by cen­tral banks plus bur­den­some la­bor mar­ket reg­u­la­tions is an ob­sta­cle to their re-employment

Border­line cases:

  • unified world gov­ern­ment with pow­er­ful mon­i­tor­ing regime for ‘dan­ger­ous’ technologies

  • widely used gene ther­apy that brought any­one up to a min­i­mum equiv­a­lent IQ of 120

Cen­tral­ity to limited AI proposals

We can view the gen­eral prob­lem of Limited AI as hav­ing the cen­tral ques­tion: What is a pivotal pos­i­tive ac­com­plish­ment, such that an AI which does that thing and not some other things is there­fore a whole lot safer to build? This is not a triv­ial ques­tion be­cause it turns out that most in­ter­est­ing things re­quire gen­eral cog­ni­tive ca­pa­bil­ities, and most in­ter­est­ing goals can re­quire ar­bi­trar­ily com­pli­cated value iden­ti­fi­ca­tion prob­lems to pur­sue safely.

It’s triv­ial to cre­ate an “AI” which is ab­solutely safe and can’t be used for any pivotal achieve­ments. E.g. Google Maps, or a rock with “2 + 2 = 4″ painted on it.

(For ar­gu­ments that Google Maps could po­ten­tially help re­searchers drive to work faster or that a rock could po­ten­tially be used to bash in the chas­sis of a hos­tile su­per­in­tel­li­gence, see the pages on guarded defi­ni­tions and strained ar­gu­ments.)

Cen­tral­ity to con­cept of ‘ad­vanced agent’

We can view the no­tion of an ad­vanced agent as “agent with enough cog­ni­tive ca­pac­ity to cause a pivotal event, pos­i­tive or nega­tive”; the ad­vanced agent prop­er­ties are ei­ther those prop­er­ties that might lead up to par­ti­ci­pa­tion in a pivotal event, or prop­er­ties that might play a crit­i­cal role in de­ter­min­ing the AI’s tra­jec­tory and hence how the pivotal event turns out.

Policy of fo­cus­ing effort on caus­ing pivotal pos­i­tive events or pre­vent­ing pivotal nega­tive events

Ob­vi­ous util­i­tar­ian ar­gu­ment: do­ing some­thing with a big pos­i­tive im­pact is bet­ter than do­ing some­thing with a small pos­i­tive im­pact.

In the larger con­text of effec­tive al­tru­ism and ad­e­quacy the­ory, the is­sue is a bit more com­pli­cated. Rea­son­ing from ad­e­quacy the­ory says that there will of­ten be bar­ri­ers (con­cep­tual or oth­er­wise) to the high­est-re­turn in­vest­ments. When we find that hugely im­por­tant things seem rel­a­tively ne­glected and hence promis­ing of high marginal re­turns if solved, this is of­ten be­cause there’s some con­cep­tual bar­rier to run­ning ahead and do­ing them.

For ex­am­ple: to tackle the hard­est prob­lems is of­ten much scarier (you’re not sure if you can make any progress on de­scribing a self-mod­ify­ing agent that prov­ably has a sta­ble goal sys­tem) than ‘bounc­ing off’ to some eas­ier, more com­pre­hen­si­ble prob­lem (like writ­ing a pa­per about the im­pact of robotic cars on un­em­ploy­ment, where you’re very sure you can in fact write a pa­per like that at the time you write the grant pro­posal).

The ob­vi­ous coun­ter­ar­gu­ment is that per­haps you can’t make progress on your prob­lem of self-mod­ify­ing agents, per­haps it’s too hard. But from this it doesn’t fol­low that the robotic-cars pa­per is what we should be do­ing in­stead—the robotic cars pa­per only makes sense if there are no ne­glected tractable in­vest­ments that have big­ger rel­a­tive marginal in­puts into more pivotal events.

If there are in fact some ne­glected tractable in­vest­ments in di­rectly pivotal events, then we can ex­pect a search for pivotal events to turn up su­pe­rior places to in­vest effort. But a failure mode of this search is if we fail to cog­ni­tively guard the con­cept of ‘pivotal event’. In par­tic­u­lar, if we’re al­lowed to have in­di­rect ar­gu­ments for ‘rele­vance’ that go through big com­mon pools of good­ness like ‘friendli­ness of na­tions to­ward each other’, then the pool of in­ter­ven­tions in­side that con­cept is so large that it will start to in­clude things that are op­ti­mized for ap­peal un­der more usual met­rics, e.g. pa­pers that don’t seem un­nerv­ing and that some­body knows they can write. So if there’s no guarded con­cept of re­search on ‘pivotal’ things, we will end up with very stan­dard re­search be­ing done, the sort that would oth­er­wise be done by academia any­way, and our in­vest­ment will end up hav­ing a low ex­pected marginal im­pact on the fi­nal out­come.

This sort of qual­i­ta­tive rea­son­ing about what is or isn’t ‘pivotal’ wouldn’t be nec­es­sary if we could put solid num­bers on the im­pact of each in­ter­ven­tion on the prob­a­ble achive­ment of as­tro­nom­i­cal goods. But that is an un­likely ‘if’. Thus, there’s some cause to rea­son qual­i­ta­tively about what is or isn’t ‘pivotal’, as op­posed to just calcu­lat­ing out the num­bers, when we’re try­ing to pur­sue as­tro­nom­i­cal al­tru­ism.


  • Value achievement dilemma

    How can Earth-origi­nat­ing in­tel­li­gent life achieve most of its po­ten­tial value, whether by AI or oth­er­wise?