Reflective consistency

A de­ci­sion sys­tem is “re­flec­tively con­sis­tent” if it can ap­prove the con­struc­tion of similar de­ci­sion sys­tems. For ex­am­ple, if you have an ex­pected util­ity satis­ficer (it ei­ther takes the null ac­tion, or an ac­tion with ex­pected util­ity greater than \(\theta\)) then this agent can self-mod­ify to any other de­sign which also ei­ther takes no ac­tion, or ap­proves a plan with ex­pected util­ity greater than \(\theta.\) A satis­ficer might also ap­prove chang­ing it­self into an ex­pected util­ity max­i­mizer (if it ex­pects that this self-mod­ifi­ca­tion it­self leads to ex­pected util­ity at least \(\theta\)) but it will at least ap­prove re­plac­ing it­self with an­other satis­ficer. On the other hand, a causal de­ci­sion the­o­rist given a chance to self-mod­ify will only ap­prove the con­struc­tion of some­thing that is not a causal de­ci­sion the­o­rist. A prop­erty satis­fies the stronger con­di­tion of re­flec­tive sta­bil­ity when de­ci­sion sys­tems with that prop­erty only ap­prove their own re­place­ment with other de­ci­sion sys­tems with that prop­erty. For ex­am­ple, a pa­per­clip max­i­mizer will un­der or­di­nary cir­cum­stances only ap­prove code changes that pre­serve the prop­erty of max­i­miz­ing pa­per­clips, so “want­ing to make pa­per­clips” is re­flec­tively sta­ble and not just re­flec­tively con­sis­tent.


  • Vingean reflection

    The prob­lem of think­ing about your fu­ture self when it’s smarter than you.