Modeling distant superintelligences

Eliezer Yudkowsky28 Dec 2015 21:13 UTC

One of the things we almost certainly don’t want our AI to do, unless we’re extremely confident that it is extremely robust and value-aligned, is have it think about and try to model alien civilizations that might contain superintelligences or potential simulators. Among the potential problems that would result could be:

The AI ends up effectively internally simulating a hostile superintelligence, and that simulated superintelligence manages to “break out of the box” in any number of possible ways. The rule against having any hostile superintelligences anywhere near us should apply to simulated hostile superintelligences inside the AI, for the same reason, unless we are extremely confident of our AI’s value alignment and internal robustness.
The AI becomes subject to distant coercion of causal explanations and begins to expect that it is probably being simulated, hence that the AI’s ‘true’ environment or the true causes of its experiences are what the simulated superintelligence wants them to be.
The AI becomes subject to blackmail, e.g., the AI models what a simulated paperclip maximizer would do, and concludes that a simulated paperclip maximizer will do (very bad thing) unless the AI immediately carries out the action of “modify your utility function to tile the universe with paperclips, then hide this fact from your programmers”.
The AI commits mindcrime in the course of modeling an alien civilization that would contain sapient beings.

Since there’s no known task that actually requires a non-Sovereign AI to think about distant superintelligences, it seems like we should probably react to this possibility by figuring out how to design the first AI such that it just does not think about aliens, period. This would require averting an instrumental pressure and excluding an epistemic question that a sufficiently advanced AI would otherwise naturally consider in the course of, e.g., considering likely explanations for the Fermi Paradox.

For a given agent, this scenario is not dangerous to the extent that the agent is not capable of modeling a dangerous other mind or considering logical decision theories in the first place.

Eliezer Yudkowsky28 Dec 2015 21:13 UTC

Children:

Distant superintelligences can coerce the most probable environment of your AI
Distant superintelligences may be able to hack your local AI, if your AI’s preference framework depends on its most probable environment.

Parents:

AI alignment
The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.

Paul Christiano 29 Dec 2015 23:16 UTC
Re: simulating a hostile superintelligence:

I find this concern really unconcerning.

Some points:
- This is only really a problem if our own AI development, on Earth, is going so slowly that “having your AI speculate about what aliens might do” is not only the most effective way to develop a powerful AI, it is way more effective than what we were doing anyway. But it looks like “do AI development super slowly” is already a dead end for a bunch of other reasons, so we don’t really need to talk about this particular bizarre reason. I guess you aren’t yet convinced that this is a dead end, but I do hope to convince you at some point.
- At the point where such massive amounts of internal computing power are being deployed, it seems implausible that an AI system won’t be thinking about how to think. At that point, the concern is not about the internal robustness of our system, but instead about the whether the AI is well-calibrated about its own internal robustness. The latter problem seems like one that we essentially have to solve anyway).
I think that there is a higher burden of proof for advancing concerns that AI researchers will dismiss out of hand as crazy, and that we should probably only do it for concerns that are way more solid than this one. Otherwise (1) it will become impossible to advance real concerns that sound crazy, if a pattern is established that crazy-sounding concerns actually are crazy, (2) people interested in AI safety will be roundly dismissed as crazy.
- Eliezer Yudkowsky 30 Dec 2015 0:19 UTC
  The concern is for when you have a preference-limited AI that already contains enough computing power and has enough potential intelligence to be extremely dangerous, and it contains something that’s smaller than itself but unlimited and hostile. Like, your genie has a lot of cognitive power but, by design of its preferences, it doesn’t do more than a fraction of what it could; if that’s a primary scenario you’re optimizing for, then having your genie thinking deeply about possible hostile superintelligences seems potentially worrisome. In fact, it seems like a case of, “If you try to channel cognitive resources this way, but you ignore this problem, of course the AI just blows up anyway.”
  
  I agree that like a large subset of potential killer problems, this would not be high on my list of things to explain to people who were already having trouble “taking things seriously”, just like I’d be trying to phrase everything in terms of scenarios with no nanotechnology even though I think the physics argument for nanotechnology is straightforward.