Moral hazards in AGI development

“Mo­ral haz­ard” is when the di­rec­tors of an ad­vanced AGI give in to the temp­ta­tion to di­rect the AGI in ways that the rest of us would re­gard as ‘bad’, like, say, declar­ing them­selves God-Em­per­ors. Limit­ing the du­ra­tion of the hu­man pro­gram­mers’ ex­po­sure to the temp­ta­tions of power is one rea­son to want a non-hu­man-com­manded, in­ter­nally sovereign AGI even­tu­ally, di­rected by some­thing like co­her­ent ex­trap­o­lated vo­li­tion, even if the far more difficult safety is­sues mean we shouldn’t build the first AGI that way. Any­one recom­mend­ing “over­sight” as a guard against moral haz­ard is ad­vised to think hard about moral haz­ard in the over­seers.

A smart setup with any other body “over­see­ing” the pro­gram­mers of a Task AGI, if we don’t just want the moral haz­ard trans­ferred to peo­ple who may be even less trust­wor­thy, prob­a­bly means mak­ing sure that in prac­tice both the pro­gram­mers and the over­seers have to agree on a Task be­fore it gets car­ried out, not that one side can in prac­tice do things even if the other side dis­agrees, where “in prac­tice” would in­clude e.g. it only tak­ing one month to re­de­velop the tech­nol­ogy in a way that re­sponded to only the over­seers.


  • Value achievement dilemma

    How can Earth-origi­nat­ing in­tel­li­gent life achieve most of its po­ten­tial value, whether by AI or oth­er­wise?