Safe impact measure

A safe im­pact mea­sure is one that cap­tures all changes to ev­ery vari­able a hu­man might care about, with no edge-cases where a lot of value could be de­stroyed by a ‘low im­pact’ ac­tion. A safe im­pact mea­sure must also not gen­er­ate so many false alarms of ‘high im­pact’ that no strat­egy can be dis­t­in­guished as ‘low im­pact’.

Im­por­tance /​ uses

A safe im­pact mea­sure is an open prob­lem of value al­ign­ment, which if solved, might be used in con­struct­ing:

  • A low-im­pact AI, a sub­species of Ge­nie that tries to carry out its goals while oth­er­wise min­i­miz­ing the num­ber of im­por­tant things that it af­fects.

  • An Or­a­cle that tries not to af­fect the world out­side its box at all, apart from its out­put.

  • A shut­down util­ity func­tion that in­cen­tives a cor­rigible AI to halt safely (per­haps af­ter a but­ton is pressed).

Some uses of a safe im­pact mea­sure pos­tu­late that the im­pact mea­sure has a ‘hole’ or some form of per­mit­ted out­put chan­nel. For ex­am­ple, one Stu­art Arm­strong pro­posal in­volves an Or­a­cle in a sealed box, with a sin­gle out­put chan­nel that is con­nected to the Or­a­cle with high quan­tum prob­a­bil­ity, and con­nected to a quan­tum noise gen­er­a­tor with low quan­tum prob­a­bil­ity. The Or­a­cle is pu­ta­tively given the goal of gen­er­at­ing an out­put sig­nal with some in­for­ma­tional prop­erty, and oth­er­wise min­i­miz­ing the ‘im­pact’ rel­a­tive to the same out­put hav­ing been gen­er­ated by the quan­tum noise gen­er­a­tor in­stead of the Or­a­cle. This is in­tended to cap­ture the idea that the Or­a­cle’s effect on the world should only oc­cur via the effect of the out­put mes­sage, and not take ad­van­tage of any side chan­nels such as man­ag­ing to send a ra­dio sig­nal out­side the Or­a­cle box.


To be used in­side an ad­vanced agent, an im­pact mea­sure must be safe in the face of what­ever cog­ni­tive pres­sures and op­ti­miza­tion pres­sures might tend to pro­duce edge in­stan­ti­a­tions or Near­est un­blocked strat­egy—it must cap­ture so much var­i­ance that there is no clever strat­egy whereby an ad­vanced agent can pro­duce some spe­cial type of var­i­ance that evades the mea­sure. Ideally, the mea­sure will pass the Omni Test, mean­ing that even if it sud­denly gained perfect con­trol over ev­ery par­ti­cle in the uni­verse, there would still be no way for it to have what in­tu­itively seems like a ‘large in­fluence’ on the fu­ture, with­out that strat­egy be­ing as­sessed as hav­ing a ‘high im­pact’.

The rea­son why a safe im­pact mea­sure might be pos­si­ble, and speci­fi­able to an AI with­out hav­ing to solve the en­tire value learn­ing prob­lem for com­plex val­ues, is that it may be pos­si­ble to up­per-bound the value-laden and com­plex quan­tity ‘im­pact on liter­ally ev­ery­thing cared about’ by some much sim­pler quan­tity that says roughly ‘im­pact on ev­ery­thing’ - all causal pro­cesses worth mod­el­ing on a macroscale, or some­thing along those lines.

The challenge of a safe im­pact mea­sure is that we can’t just mea­sure, e.g., ‘num­ber of par­ti­cles in­fluenced in any way’ or ‘ex­pected shift in all par­ti­cles in the uni­verse’. For the former case, con­sider that a one-gram mass on Earth ex­erts a grav­i­ta­tional pull that ac­cel­er­ates the Moon to­ward it at roughly 4 x 10^-31 m/​s^2, and ev­ery sneeze has a very slight grav­i­ta­tional effect on the atoms in dis­tant galax­ies. Since ev­ery de­ci­sion qual­i­ta­tively ‘af­fects’ ev­ery­thing in its fu­ture light cone, this mea­sure will have too many false pos­i­tives /​ not ap­prove any strat­egy /​ not use­fully dis­crim­i­nate un­usu­ally dan­ger­ous atoms.

For the pro­posed quan­tity ‘ex­pec­ta­tion of the net shift pro­duced on all atoms in the uni­verse’: If the uni­verse (in­clud­ing the Earth) con­tains at least one pro­cess chaotic enough to ex­hibit but­terfly effects, then any sneeze any­where ends up pro­duc­ing a very great ex­pected shift in to­tal mo­tions. Again we must worry that the im­pact mea­sure, as eval­u­ated in­side the mind of a su­per­in­tel­li­gence, would just as­sign uniformly high val­ues to ev­ery strat­egy, mean­ing that un­usu­ally dan­ger­ous ac­tions would not be dis­crim­i­nated for alarms or ve­tos.

De­spite the first imag­in­able pro­pos­als failing, it doesn’t seem like a ‘safe im­pact mea­sure’ nec­es­sar­ily has the type of value-load­ing that would make it VA-com­plete. One in­tu­ition pump for ‘no­tice big effects in gen­eral’ not be­ing value-laden, is that if we imag­ine aliens with non­hu­man de­ci­sion sys­tems try­ing to solve this prob­lem, it seems easy to imag­ine that the aliens would come up with a safe im­pact mea­sure that we would also re­gard as safe.


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.