Reflective stability

An agent is “re­flec­tively sta­ble” in some re­gard, if hav­ing a choice of how to con­struct a suc­ces­sor agent or mod­ify its own code, the agent will only con­struct a suc­ces­sor that thinks similarly in that re­gard.

  • In tiling agent the­ory, an ex­pected util­ity satis­ficer is re­flec­tively con­sis­tent, since it will ap­prove of build­ing an­other EU satis­ficer, but an EU satis­ficer is not re­flec­tively sta­ble, since it may also ap­prove of build­ing an ex­pected util­ity max­i­mizer (it ex­pects the con­se­quences of build­ing the max­i­mizer to satis­fice).

  • Hav­ing a util­ity func­tion that only weighs pa­per­clips is “re­flec­tively sta­ble” be­cause pa­per­clip max­i­miz­ers only try to build other pa­per­clip max­i­miz­ers.

If, think­ing the way you cur­rently do (in some re­gard), it seems un­ac­cept­able to not think that way (in that re­gard), then you are re­flec­tively sta­ble (in that re­gard).

un­tan­gle pos­si­ble con­fu­sion about re­flec­tive sta­bil­ity not be­ing “good” and want­ing re­flec­tively un­sta­ble agents be­cause it seems bad to them if a pa­per­clip max­i­mizer stays a pa­per­clip max­i­mizer, or they imag­ine causal de­ci­sion the­o­rists build­ing some­thing in­cre­men­tally saner than ca­sual de­ci­sion the­o­rists.


  • Reflectively consistent degree of freedom

    When an in­stru­men­tally effi­cient, self-mod­ify­ing AI can be like X or like X’ in such a way that X wants to be X and X’ wants to be X’, that’s a re­flec­tively con­sis­tent de­gree of free­dom.

  • Other-izing (wanted: new optimization idiom)

    Max­i­miza­tion isn’t pos­si­ble for bounded agents, and satis­fic­ing doesn’t seem like enough. What other kind of ‘iz­ing’ might be good for re­al­is­tic, bounded agents?

  • Consequentialist preferences are reflectively stable by default

    Gandhi wouldn’t take a pill that made him want to kill peo­ple, be­cause he knows in that case more peo­ple will be mur­dered. A pa­per­clip max­i­mizer doesn’t want to stop max­i­miz­ing pa­per­clips.


  • Vingean reflection

    The prob­lem of think­ing about your fu­ture self when it’s smarter than you.