Moral hazards in AGI development

“Moral hazard” is when the directors of an advanced AGI give in to the temptation to direct the AGI in ways that the rest of us would regard as ‘bad’, like, say, declaring themselves God-Emperors. Limiting the duration of the human programmers’ exposure to the temptations of power is one reason to want a non-human-commanded, internally sovereign AGI eventually, directed by something like coherent extrapolated volition, even if the far more difficult safety issues mean we shouldn’t build the first AGI that way. Anyone recommending “oversight” as a guard against moral hazard is advised to think hard about moral hazard in the overseers.

A smart setup with any other body “overseeing” the programmers of a Task AGI, if we don’t just want the moral hazard transferred to people who may be even less trustworthy, probably means making sure that in practice both the programmers and the overseers have to agree on a Task before it gets carried out, not that one side can in practice do things even if the other side disagrees, where “in practice” would include e.g. it only taking one month to redevelop the technology in a way that responded to only the overseers.