Cartesian agent

A Cartesian agent is an agent that’s a separate system from the environment, linked by the Cartesian boundary across which passes sensory information and motor outputs. This is most commonly formalized by two distinct Turing machines, an ‘agent’ machine and an ‘environment’ machine. The agent receives sensory information from the environment and outputs motor information; the environment receives the agent’s motor information and computes the agent’s sensory information.

An actual human, in contrast, is a “naturalistic agent” that is a continuous part of the universe—a human is one particular collection of atoms within the physical universe, and there’s no type distinction between the atoms inside the human and the atoms outside the human. Eating a particular kind of mushroom can make us think differently; dropping an anvil on your own head doesn’t just cause you to see anvilness or receive a pain signal, it smashes your computing substrate and causes you not to feel any future sensory information at all.

In the context of AI alignment theory, Cartesian agents are usually associated with optimizing for sensory rewards—there’s some particular component of the environment’s input which is a “reward signal”, or the agent is trying to optimize some directly computed function of sensory data.


  • Cartesian agent-environment boundary

    If your agent is separated from the environment by an absolute border that can only be crossed by sensory information and motor outputs, it might just be a Cartesian agent.