Safe but useless

“This type of safety implies uselessness” (or conversely, “any AI powerful enough to be useful will still be unsafe”) is an accusation leveled against a proposed AI safety measure that must, to make the AI safe, be enforced to the point that it will make the AI useless.

For a non-AI metaphor, consider a scissors and its dangerous blades. We can have a “safety scissors” that is only just sharp enough to cut paper—but this is still sharp enough to do some damage if you work at it. If you try to make the scissors even safer by encasing the dangerous blades in foam rubber, the scissors can’t cut paper any more. If the scissors can cut paper, it’s still unsafe. Maybe you could in principle cut clay with a scissors like that, but this is no defense unless you can tell us something very useful that can be done by cutting clay.

Similarly, there’s an obvious way to try cutting down the allowed output of an Oracle AGI to the point where all it can do is tell us that a given theorem is provable from the axioms of Zermelo-Fraenkel set theory. This might prevent the AGI from hacking the human operators into letting it out, since all that can leave the box is a single yes-or-no bit, sent at some particular time. An untrusted superintelligence inside this scheme would have the option of strategically not telling us when a theorem is provable in ZF; but if the bit from the proof-verifier said that the input theorem was ZF-provable, we could very likely trust that.

But now we run up against the problem that nobody knows how to actually save the world by virtue of sometimes knowing for sure that a theorem is provable in ZF. The scissors has been blunted to where it’s probably completely safe, but can only cut clay; and nobody knows how to do enough good by cutting clay.

Ideal models of “safe but useless” agents

Should you have cause to do a mathematical study of this issue, then an excellent ideal model of a safe but useless agent, embodying maximal safety and minimum usefulness, would be a rock.


  • Advanced safety

    An agent is really safe when it has the capacity to do anything, but chooses to do what the programmer wants.