Reflective consistency

A decision system is “reflectively consistent” if it can approve the construction of similar decision systems. For example, if you have an expected utility satisficer (it either takes the null action, or an action with expected utility greater than \(\theta\)) then this agent can self-modify to any other design which also either takes no action, or approves a plan with expected utility greater than \(\theta.\) A satisficer might also approve changing itself into an expected utility maximizer (if it expects that this self-modification itself leads to expected utility at least \(\theta\)) but it will at least approve replacing itself with another satisficer. On the other hand, a causal decision theorist given a chance to self-modify will only approve the construction of something that is not a causal decision theorist. A property satisfies the stronger condition of reflective stability when decision systems with that property only approve their own replacement with other decision systems with that property. For example, a paperclip maximizer will under ordinary circumstances only approve code changes that preserve the property of maximizing paperclips, so “wanting to make paperclips” is reflectively stable and not just reflectively consistent.


  • Vingean reflection

    The problem of thinking about your future self when it’s smarter than you.