Vingean reflection

Vinge’s Prin­ci­ple im­plies that when an agent is de­sign­ing an­other agent (or mod­ify­ing its own code), it needs to ap­prove the other agent’s de­sign with­out know­ing the other agent’s ex­act fu­ture ac­tions.

Deep Blue’s pro­gram­mers de­cided to run Deep Blue, with­out know­ing Deep Blue’s ex­act moves against Kas­parov or how Kas­parov would re­ply to each move, and with­out be­ing able to vi­su­al­ize the ex­act real-out­come in­stead. In­stead, by rea­son­ing about the way Deep Blue was search­ing through game trees, they ar­rived at a well-jus­tified but ab­stract be­lief that Deep Blue was ‘try­ing to win’ (rather than try­ing to lose) and rea­son­ing effec­tively to that end.

Vingean re­flec­tion is rea­son­ing about cog­ni­tive sys­tems, es­pe­cially cog­ni­tive sys­tems very similar to your­self (in­clud­ing your ac­tual self), un­der the con­straint that you can’t pre­dict the ex­act fu­ture out­puts. We need to make pre­dic­tions about the con­se­quence of op­er­at­ing an agent in an en­vi­ron­ment via rea­son­ing on some more ab­stract level, some­how.

In tiling agents the­ory, this ap­pears as the rule that we should talk about our suc­ces­sor’s ac­tions only in­side of quan­tifiers.

“Vingean re­flec­tion” may be a much more gen­eral is­sue in the de­sign of ad­vanced cog­ni­tive sys­tems than it might ap­pear at first glance. An agent rea­son­ing about the con­se­quences of its cur­rent code, or con­sid­er­ing what will hap­pen if it spends an­other minute think­ing, can be viewed as do­ing Vingean re­flec­tion. A re­flec­tive, self-mod­el­ing chess-player would not choose to spend an­other minute think­ing, if it thought that its fur­ther thoughts would be try­ing to lose rather than win the game—but it can’t pre­dict its own ex­act thoughts in ad­vance.

Vingean re­flec­tion can also be seen as the study of how a given agent wants think­ing to oc­cur in cog­ni­tive com­pu­ta­tions, which may be im­por­tantly differ­ent from how the agent cur­rently thinks. If these two co­in­cide, we say the agent is re­flec­tively sta­ble.

Tiling agents the­ory is presently the main line of re­search try­ing to slowly get started on for­mal­iz­ing Vingean re­flec­tion and re­flec­tive sta­bil­ity.

Fur­ther read­ing:


  • Vinge's Principle

    An agent build­ing an­other agent must usu­ally ap­prove its de­sign with­out know­ing the agent’s ex­act policy choices.

  • Reflective stability

    Want­ing to think the way you cur­rently think, build­ing other agents and self-mod­ifi­ca­tions that think the same way.

  • Tiling agents theory

    The the­ory of self-mod­ify­ing agents that build suc­ces­sors that are very similar to them­selves, like re­peat­ing tiles on a tes­se­lated plane.

  • Reflective consistency

    A de­ci­sion sys­tem is re­flec­tively con­sis­tent if it can ap­prove of it­self, or ap­prove the con­struc­tion of similar de­ci­sion sys­tems (as well as per­haps ap­prov­ing other de­ci­sion sys­tems too).


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.