Modeling distant superintelligences

One of the things we al­most cer­tainly don’t want our AI to do, un­less we’re ex­tremely con­fi­dent that it is ex­tremely ro­bust and value-al­igned, is have it think about and try to model alien civ­i­liza­tions that might con­tain su­per­in­tel­li­gences or po­ten­tial simu­la­tors. Among the po­ten­tial prob­lems that would re­sult could be:

  • The AI ends up effec­tively in­ter­nally simu­lat­ing a hos­tile su­per­in­tel­li­gence, and that simu­lated su­per­in­tel­li­gence man­ages to “break out of the box” in any num­ber of pos­si­ble ways. The rule against hav­ing any hos­tile su­per­in­tel­li­gences any­where near us should ap­ply to simu­lated hos­tile su­per­in­tel­li­gences in­side the AI, for the same rea­son, un­less we are ex­tremely con­fi­dent of our AI’s value al­ign­ment and in­ter­nal ro­bust­ness.

  • The AI be­comes sub­ject to dis­tant co­er­cion of causal ex­pla­na­tions and be­gins to ex­pect that it is prob­a­bly be­ing simu­lated, hence that the AI’s ‘true’ en­vi­ron­ment or the true causes of its ex­pe­riences are what the simu­lated su­per­in­tel­li­gence wants them to be.

  • The AI be­comes sub­ject to black­mail, e.g., the AI mod­els what a simu­lated pa­per­clip max­i­mizer would do, and con­cludes that a simu­lated pa­per­clip max­i­mizer will do (very bad thing) un­less the AI im­me­di­ately car­ries out the ac­tion of “mod­ify your util­ity func­tion to tile the uni­verse with pa­per­clips, then hide this fact from your pro­gram­mers”.

  • The AI com­mits mind­crime in the course of mod­el­ing an alien civ­i­liza­tion that would con­tain sapi­ent be­ings.

Since there’s no known task that ac­tu­ally re­quires a non-Sovereign AI to think about dis­tant su­per­in­tel­li­gences, it seems like we should prob­a­bly re­act to this pos­si­bil­ity by figur­ing out how to de­sign the first AI such that it just does not think about aliens, pe­riod. This would re­quire avert­ing an in­stru­men­tal pres­sure and ex­clud­ing an epistemic ques­tion that a suffi­ciently ad­vanced AI would oth­er­wise nat­u­rally con­sider in the course of, e.g., con­sid­er­ing likely ex­pla­na­tions for the Fermi Para­dox.

For a given agent, this sce­nario is not dan­ger­ous to the ex­tent that the agent is not ca­pa­ble of mod­el­ing a dan­ger­ous other mind or con­sid­er­ing log­i­cal de­ci­sion the­o­ries in the first place.



  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.