Safe plan identification and verification

Safe plan identification is the problem of how to give a Task AGI training cases, answered queries, abstract instructions, etcetera such that (a) the AGI can thereby identify outcomes in which the task was fulfilled, (b) the AGI can generate an okay plan for getting to some such outcomes without bad side effects, and (c) the user can verify that the resulting plan is actually okay via some series of further questions or user querying. This is the superproblem that includes task identification, as much value identification as is needed to have some idea of the general class of post-task worlds that the user thinks are okay, any further tweaks like low-impact planning or flagging inductive ambiguities, etcetera. This superproblem is distinguished from the entire problem of building a Task AGI because there’s further issues like corrigibility, behaviorism, building the AGI in the first place, etcetera. The safe plan identification superproblem is about communicating the task plus user preferences about side effects and implementation, such that this information allows the AGI to identify a safe plan and for the user to know that a safe plan has been identified.



  • Task-directed AGI

    An advanced AI that’s meant to pursue a series of limited-scope goals given it by the user. In Bostrom’s terminology, a Genie.