Difficulty of AI alignment

This page attempts to list basic propositions in computer science which, if they are true, would be ultimately responsible for rendering difficult the task of getting good outcomes from a sufficiently advanced Artificial Intelligence.



By saying that these propositions would, if true, seem to imply “difficulties”, we don’t mean to imply that these problems are unsolvable. We could distinguish possible levels of “difficulty” as follows:

  • The problem is straightforwardly solvable, but must in fact be solved.

  • The problem is straightforwardly solvable if foreseen in advance, but does not force a general solution in its early manifestations—if the later problems have not been explicitly foreseen, early solutions may fail to generalize. Projects which are not exhibiting sufficient foresight may fail to future-proof for the problem, even though it is in some sense easy.

  • The problem seems solvable by applying added effort, but the need for this effort will contribute substantial additional time or resource requirements to the aligned version of the AGI project; implying that unsafe clones or similar projects would have an additional time advantage. E.g., computer operating systems can be made more secure, but it adds rather more than 5% to development time and requires people willing to take on a lot of little inconveniences instead of doing things the most convenient way. If there are enough manifested difficulties like this, and the sum of their severity is great enough, then…

  • If there is strongly believed to be a great and unavoidable resource requirement even for safety-careless AGI projects, then we have a worrisome situation in which coordination among the leading five AGI projects is required to avoid races to the bottom on safety, and arms-race scenarios where the leading projects don’t trust each other are extremely bad.

  • If the probability seems great enough that “A safety-careless AGI project can be executed using few enough resources, relative to every group in the world that might have those resources and a desire to develop AGI, that there would be dozens or hundreds of such projects” then a sufficiently great added development for AI alignment forces closed AI development scenarios. (Because open development would give projects that skipped all the safety an insuperable time advantage, and there would be enough such projects that getting all of them to behave is impossible. (Especially in any world where, like at present, there are billionaires with great command of computational resources who don’t seem to understand Orthogonality.))

  • The problem seems like it should in principle have a straightforward solution, but it seems like there’s a worrisome probability of screwing up along the way, meaning…

  • It requires substantial additional work and time to solve this problem reliably and know that we have solved it (see above), or

  • Feasible amounts of effort still leave a worrying residue of probability that the attempted solution contains a land mine.

  • The problem seems unsolvable using realistic amounts of effort, it which case aligned-AGI designs are constrained to avoid confronting it and we must find workarounds.

  • The problem seems like it ought to be solvable somehow, but we are not sure exactly how to solve it. This could imply that…

  • Novel research and perhaps genius is required to avoid this type of failure, even with the best of good intentions;

  • This might be a kind of conceptual problem that takes a long serial time to develop, and we should get started on it sooner;

  • We should start considering alternative design pathways that would work around or avoid the difficulty, in case the problem is not solved.


  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.