Vingean reflection

Vinge’s Principle implies that when an agent is designing another agent (or modifying its own code), it needs to approve the other agent’s design without knowing the other agent’s exact future actions.

Deep Blue’s programmers decided to run Deep Blue, without knowing Deep Blue’s exact moves against Kasparov or how Kasparov would reply to each move, and without being able to visualize the exact real-outcome instead. Instead, by reasoning about the way Deep Blue was searching through game trees, they arrived at a well-justified but abstract belief that Deep Blue was ‘trying to win’ (rather than trying to lose) and reasoning effectively to that end.

Vingean reflection is reasoning about cognitive systems, especially cognitive systems very similar to yourself (including your actual self), under the constraint that you can’t predict the exact future outputs. We need to make predictions about the consequence of operating an agent in an environment via reasoning on some more abstract level, somehow.

In tiling agents theory, this appears as the rule that we should talk about our successor’s actions only inside of quantifiers.

“Vingean reflection” may be a much more general issue in the design of advanced cognitive systems than it might appear at first glance. An agent reasoning about the consequences of its current code, or considering what will happen if it spends another minute thinking, can be viewed as doing Vingean reflection. A reflective, self-modeling chess-player would not choose to spend another minute thinking, if it thought that its further thoughts would be trying to lose rather than win the game—but it can’t predict its own exact thoughts in advance.

Vingean reflection can also be seen as the study of how a given agent wants thinking to occur in cognitive computations, which may be importantly different from how the agent currently thinks. If these two coincide, we say the agent is reflectively stable.

Tiling agents theory is presently the main line of research trying to slowly get started on formalizing Vingean reflection and reflective stability.

Further reading:

Children:

  • Vinge's Principle

    An agent building another agent must usually approve its design without knowing the agent’s exact policy choices.

  • Reflective stability

    Wanting to think the way you currently think, building other agents and self-modifications that think the same way.

  • Tiling agents theory

    The theory of self-modifying agents that build successors that are very similar to themselves, like repeating tiles on a tesselated plane.

  • Reflective consistency

    A decision system is reflectively consistent if it can approve of itself, or approve the construction of similar decision systems (as well as perhaps approving other decision systems too).

Parents:

  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.