Minimality principle

In the con­text of AI al­ign­ment, the “Prin­ci­ple of Min­i­mal­ity” or “Prin­ci­ple of Least Every­thing” says that when we are build­ing the first suffi­ciently ad­vanced ar­tifi­cial in­tel­li­gence, we are op­er­at­ing in an ex­tremely dan­ger­ous con­text in which build­ing a marginally more pow­er­ful AI is marginally more dan­ger­ous. The first AGI ever built should there­fore ex­e­cute the least dan­ger­ous plan for pre­vent­ing im­me­di­ately fol­low­ing AGIs from de­stroy­ing the world six months later. Fur­ther­more, the least dan­ger­ous plan is not the plan that seems to con­tain the fewest ma­te­rial ac­tions that seem risky in a con­ven­tional sense, but rather the plan that re­quires the least dan­ger­ous cog­ni­tion from the AGI ex­e­cut­ing it. Similarly, in­side the AGI it­self, if a class of thought seems dan­ger­ous but nec­es­sary to ex­e­cute some­times, we want to ex­e­cute the fewest pos­si­ble in­stances of that class of thought.

E.g., if we think it’s a dan­ger­ous kind of event for the AGI to ask “How can I achieve this end us­ing strate­gies from across ev­ery pos­si­ble do­main?” then we might want a de­sign where most rou­tine op­er­a­tions only search for strate­gies within a par­tic­u­lar do­main, and events where the AI searches across all known do­mains are rarer and visi­ble to the pro­gram­mers. Pro­cess­ing a goal that can re­cruit sub­goals across ev­ery do­main would be a dan­ger­ous event, albeit a nec­es­sary one, and there­fore we want to do less of it within the AI (and re­quire pos­i­tive per­mis­sion for all such cases and then re­quire op­er­a­tors to val­i­date the re­sults be­fore pro­ceed­ing).

Ideas that in­herit from this prin­ci­ple in­clude the gen­eral no­tion of Task-di­rected AGI, task­ish­ness, and mild op­ti­miza­tion.


  • Principles in AI alignment

    A ‘prin­ci­ple’ of AI al­ign­ment is a very gen­eral de­sign goal like ‘un­der­stand what the heck is go­ing on in­side the AI’ that has in­formed a wide set of spe­cific de­sign pro­pos­als.