Cartesian agent

A Carte­sian agent is an agent that’s a sep­a­rate sys­tem from the en­vi­ron­ment, linked by the Carte­sian bound­ary across which passes sen­sory in­for­ma­tion and mo­tor out­puts. This is most com­monly for­mal­ized by two dis­tinct Tur­ing ma­chines, an ‘agent’ ma­chine and an ‘en­vi­ron­ment’ ma­chine. The agent re­ceives sen­sory in­for­ma­tion from the en­vi­ron­ment and out­puts mo­tor in­for­ma­tion; the en­vi­ron­ment re­ceives the agent’s mo­tor in­for­ma­tion and com­putes the agent’s sen­sory in­for­ma­tion.

An ac­tual hu­man, in con­trast, is a “nat­u­ral­is­tic agent” that is a con­tin­u­ous part of the uni­verse—a hu­man is one par­tic­u­lar col­lec­tion of atoms within the phys­i­cal uni­verse, and there’s no type dis­tinc­tion be­tween the atoms in­side the hu­man and the atoms out­side the hu­man. Eat­ing a par­tic­u­lar kind of mush­room can make us think differ­ently; drop­ping an anvil on your own head doesn’t just cause you to see anvil­ness or re­ceive a pain sig­nal, it smashes your com­put­ing sub­strate and causes you not to feel any fu­ture sen­sory in­for­ma­tion at all.

In the con­text of AI al­ign­ment the­ory, Carte­sian agents are usu­ally as­so­ci­ated with op­ti­miz­ing for sen­sory re­wards—there’s some par­tic­u­lar com­po­nent of the en­vi­ron­ment’s in­put which is a “re­ward sig­nal”, or the agent is try­ing to op­ti­mize some di­rectly com­puted func­tion of sen­sory data.


  • Cartesian agent-environment boundary

    If your agent is sep­a­rated from the en­vi­ron­ment by an ab­solute bor­der that can only be crossed by sen­sory in­for­ma­tion and mo­tor out­puts, it might just be a Carte­sian agent.


  • Methodology of unbounded analysis

    What we do and don’t un­der­stand how to do, us­ing un­limited com­put­ing power, is a crit­i­cal dis­tinc­tion and im­por­tant fron­tier.