Difficulty of AI alignment

This page at­tempts to list ba­sic propo­si­tions in com­puter sci­ence which, if they are true, would be ul­ti­mately re­spon­si­ble for ren­der­ing difficult the task of get­ting good out­comes from a suffi­ciently ad­vanced Ar­tifi­cial In­tel­li­gence.



By say­ing that these propo­si­tions would, if true, seem to im­ply “difficul­ties”, we don’t mean to im­ply that these prob­lems are un­solv­able. We could dis­t­in­guish pos­si­ble lev­els of “difficulty” as fol­lows:

  • The prob­lem is straight­for­wardly solv­able, but must in fact be solved.

  • The prob­lem is straight­for­wardly solv­able if fore­seen in ad­vance, but does not force a gen­eral solu­tion in its early man­i­fes­ta­tions—if the later prob­lems have not been ex­plic­itly fore­seen, early solu­tions may fail to gen­er­al­ize. Pro­jects which are not ex­hibit­ing suffi­cient fore­sight may fail to fu­ture-proof for the prob­lem, even though it is in some sense easy.

  • The prob­lem seems solv­able by ap­ply­ing added effort, but the need for this effort will con­tribute sub­stan­tial ad­di­tional time or re­source re­quire­ments to the al­igned ver­sion of the AGI pro­ject; im­ply­ing that un­safe clones or similar pro­jects would have an ad­di­tional time ad­van­tage. E.g., com­puter op­er­at­ing sys­tems can be made more se­cure, but it adds rather more than 5% to de­vel­op­ment time and re­quires peo­ple will­ing to take on a lot of lit­tle in­con­ve­niences in­stead of do­ing things the most con­ve­nient way. If there are enough man­i­fested difficul­ties like this, and the sum of their sever­ity is great enough, then…

  • If there is strongly be­lieved to be a great and un­avoid­able re­source re­quire­ment even for safety-care­less AGI pro­jects, then we have a wor­ri­some situ­a­tion in which co­or­di­na­tion among the lead­ing five AGI pro­jects is re­quired to avoid races to the bot­tom on safety, and arms-race sce­nar­ios where the lead­ing pro­jects don’t trust each other are ex­tremely bad.

  • If the prob­a­bil­ity seems great enough that “A safety-care­less AGI pro­ject can be ex­e­cuted us­ing few enough re­sources, rel­a­tive to ev­ery group in the world that might have those re­sources and a de­sire to de­velop AGI, that there would be dozens or hun­dreds of such pro­jects” then a suffi­ciently great added de­vel­op­ment for AI al­ign­ment forces closed AI de­vel­op­ment sce­nar­ios. (Be­cause open de­vel­op­ment would give pro­jects that skipped all the safety an in­su­per­a­ble time ad­van­tage, and there would be enough such pro­jects that get­ting all of them to be­have is im­pos­si­ble. (Espe­cially in any world where, like at pre­sent, there are billion­aires with great com­mand of com­pu­ta­tional re­sources who don’t seem to un­der­stand Orthog­o­nal­ity.))

  • The prob­lem seems like it should in prin­ci­ple have a straight­for­ward solu­tion, but it seems like there’s a wor­ri­some prob­a­bil­ity of screw­ing up along the way, mean­ing…

  • It re­quires sub­stan­tial ad­di­tional work and time to solve this prob­lem re­li­ably and know that we have solved it (see above), or

  • Fea­si­ble amounts of effort still leave a wor­ry­ing resi­due of prob­a­bil­ity that the at­tempted solu­tion con­tains a land mine.

  • The prob­lem seems un­solv­able us­ing re­al­is­tic amounts of effort, it which case al­igned-AGI de­signs are con­strained to avoid con­fronting it and we must find workarounds.

  • The prob­lem seems like it ought to be solv­able some­how, but we are not sure ex­actly how to solve it. This could im­ply that…

  • Novel re­search and per­haps ge­nius is re­quired to avoid this type of failure, even with the best of good in­ten­tions;

  • This might be a kind of con­cep­tual prob­lem that takes a long se­rial time to de­velop, and we should get started on it sooner;

  • We should start con­sid­er­ing al­ter­na­tive de­sign path­ways that would work around or avoid the difficulty, in case the prob­lem is not solved.


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.