Causal decision theories

Causal decision theory (CDT) and its many variants are the current academically dominant form of decision theory (as of 2016). CDT contrasts with the older formalism of evidential decision theory and, more recently, the new principle of logical decision theory. CDT says that “The principle of rational choice is to decide according to the counterfactual consequences of your physical act.” That is: To figure out the consequences of a choice, imagine the universe as it will already exist at the moment when you physically act; imagine only your physical act changing; and then run the laws of physics forwards from there to figure out the consequences.

Other overviews of causal decision theory:

Causal versus evidential decision theory

Causal decision theory gained widespread acceptance based on critiques of the policies implied by the previous way of writing down the expected utility formula, which we now think of as evidential decision theory (EDT).

We can think of EDT as the accidental result of writing down expected utilities in the most obvious way: The expected consequence of an act \(a_0\) is just the probability distribution over outcomes \(o_i\) given by \(\mathbb P(o_i|a_0).\) That is, on EDT, to imagine the consequence of choosing an act \(a_0,\) we imagine what we would believe about the world if somebody told us that we’d actually chosen \(a_0.\)

To see on example of a case that pries apart EDT and CDT, consider the Toxoplasmosis Dilemma. Suppose that a certain parasitic infection, often carried by cats, has been found to make humans enjoy petting cats more (thus helping to spread the infection). Suppose that statistics have found that in a certain experimental setup, 10% of the people who don’t pet a cute kitten, and 20% of the people who do pet the kitten, have toxoplasmosis. The kitten itself is guaranteed to have been sterilized and free of toxoplasmosis. The disutility of toxoplasmosis as a parasitic infection greatly outweighs the pleasure of petting the kitten. Do you pet the kitten?

An EDT agent might reason, “If I learn as news that I pet the kitten, I would estimate a 10% higher chance that I have toxoplasmosis, compared to the world in which I do not learn that I pet the kitten. Therefore, I will not pet the kitten.”

A CDT agent would reason, “When I imagine the world up to the point where I pet the kitten, either I already have toxoplasmosis or I don’t. Petting the kitten can’t cause me to get toxoplasmosis. Therefore, I should pet the kitten… now, having realized that I intend to pet the kitten, I realize that I have a 20% chance of having toxoplasmosis. But in the counterfactual world where I don’t pet the kitten, my probability of having toxoplasmosis would counterfactually still be 20%, and I’d miss out on petting the kitten as well.”

Due to it being widely agreed that the CDT agent is being more reasonable in the above case, CDT was widely adopted as a replacement for the previous formalism that was then relabeled as EDT.

EDT and CDT are computed in formally different ways. When we condition on our actions inside EDT, we are computing a conditional probability, whereas in CDT, we are computing a causal counterfactual. The difference between the two is sometimes explained by contrasting this pair of sentences:

  • If Lee Harvey Oswald didn’t shoot John F. Kennedy, somebody else did.

  • If Lee Harvey Oswald hadn’t shot John F. Kennedy, somebody else would have.

In the first sentence, we imagine being told as news that Oswald didn’t shoot Kennedy, and updating our beliefs to integrate this with the rest of our observations. Formally, we take whatever tiny shred of probability we might have assigned to possible worlds where the history books are wrong and Oswald didn’t actually shoot Kennedy, and imagine that tiny shred of probability expanding to become the whole of our posterior probability distribution. In particular, if we imagine whatever shred of probability we assign to worlds like that, we know that even in those worlds, Kennedy was still shot.

Let \(O\) denote the proposition that Oswald shot Kennedy, \(\neg O\) denote \(O\) being false, and \(K\) denote the proposition that Kennedy was shot. Our revised probability of Kennedy being shot if \(O\) were actually false, written as \(\mathbb P(K|\neg O),\) would still be quite high.

The second sentence asks us to imagine how a counterfactual world would play out if Oswald had acted differently. To visualize this counterfactual:

  • We imagine everything in the world being the same up until the point where Oswald decides to shoot Kennedy.

  • We surgically intervene on our imagined world to change Oswald’s decision to not-shooting, without changing any other facts about the past.

  • We rerun our model of the world’s mechanisms forward from the point of change, to determine what would have happened in this alternate universe.

This causal counterfactual is often written as \(\mathbb P(\neg O \ \square \!\! \rightarrow K).\) If you believe that Lee Harvey Oswald acted alone (and did in fact shoot Kennedy), then you should estimate a high probability of \(\mathbb P(\neg O \ \square \!\! \rightarrow K),\) contrasting to your presumably low probability for \(\mathbb P(K|\neg O).\)

Computing causal counterfactuals

Many academic discussions of causal decision theory take for granted that we ‘just know’ a counterfactual distribution \(\mathbb P(\bullet \ || \ \bullet)\) which is treated as heaven-sent. However, one formal way of computing causal counterfactuals was given relative to the theory of causal models developed by Judea Pearl and others.

todo: put real diagrams into this section; note that it duplicates a section in Introduction to Logical Decision Theory for Computer Scientists.

The backbone of a causal model is a directed acyclic graph showing which events causally affect which other events:

  • \(X_1\) → {$X_2$, \(X_3\)} → \(X_4\)\(X_5\)

One standard example of such a causal graph is:


This says, e.g.:

  • That the current SEASON affects the probability that it’s RAINING, and separately affects the probability of the SPRINKLER turning on. (But RAINING and SPRINKLER don’t affect each other; if we know the current SEASON, we don’t need to know whether it’s RAINING to figure out the probability the SPRINKLER is on.)

  • RAINING and SPRINKLER can both cause the SIDEWALK to become wet. (So if we did observe that the sidewalk was wet, then even already knowing the SEASON, we would estimate a different probability that it was RAINING depending on whether the SPRINKLER was on. The SPRINKLER being on would ‘explain away’ the SIDEWALK’s observed wetness without any need to postulate RAIN.)

  • Whether the SIDEWALK is wet is the sole determining factor for whether the SIDEWALK is SLIPPERY. (So that if we know whether the SIDEWALK is wet, we learn nothing more about the probability that the path is SLIPPERY by being told that the SEASON is summer. But if we didn’t already know whether the SIDEWALK was wet, whether the SEASON was summer or fall might be very relevant for guessing whether the path was SLIPPERY!)

A causal model goes beyond the graph by including specific probability functions \(\mathbb P(X_i | \mathbf{pa}_i)\) for how to calculate the probability of each node \(X_i\) taking on the value \(x_i\) given the values \(\mathbf {pa}_i\) of \(x_i\)’s immediate ancestors. It is implicitly assumed that the causal model factorizes, so that the probability of any value assignment \(\mathbf x\) to the whole graph can be calculated using the product:

$$\mathbb P(\mathbf x) = \prod_i \mathbb P(x_i | \mathbf{pa}_i)$$

Then the counterfactual conditional \(\mathbb P(\mathbf x | \operatorname{do}(X_j=x_j))\) is calculated via:

$$\mathbb P(\mathbf x | \operatorname{do}(X_j=x_j)) = \prod_{i \neq j} \mathbb P(x_i | \mathbf{pa}_i)$$

(We assume that \(\mathbf x\) has \(x_j\) equaling the \(\operatorname{do}\)-specified value of \(X_j\); otherwise its conditioned probability is defined to be \(0\).)

This just says that when we set \(\operatorname{do}(X_j=x_j)\) we ignore the ordinary parent nodes for \(X_j\) and just say that whatever the values of \(\mathbf{pa}_j,\) the probability of \(X_j = x_j\) is 1.

This formula implies that conditioning on \(\operatorname{do}(X_j=x_j)\) can only affect the probabilities of variables \(X_k\) that are “downstream” of \(X_j\) in the directed graph of the causal model. (Which is why choosing to pet the kitten can’t possibly affect whether you have toxoplasmosis.)

Then expected utility should be calculated as:

$$\mathbb E[\mathcal U| \operatorname{do}(a_x)] = \sum_{o_i \in \mathcal O} \mathcal U(o_i) \cdot \mathbb P(o_i | \operatorname{do}(a_x))$$

Under this rule, we will calculate that we can’t affect the probability of having toxoplasmosis by petting the cat, since our choice to pet the cat is causally downstream of whether we have toxoplasmosis.

put diagram here

Proposed technical refinements of CDT

The semantics of the \(\operatorname{do}()\) operation, or causal counterfactuals generally, imply that in Newcomblike problems the first pass of a CDT expected utility calculation may return quantitatively wrong utilities, or even a qualitatively bad option, since the CDT agent will not yet have updated background beliefs based on observing its own decision.

In Newcomb’s Problem, the mischievous Omega places two boxes before you, a transparent Box A containing $1,000, and an opaque Box B. Omega then departs. You can take one box or both boxes. If Omega predicted that you would take only Box B, then Omega has already put $1,000,000 into Box B. If Omega predicted you would two-box, Box B already contains nothing.

Suppose that in the general population, the base rate of taking only Box B is 23. Then at the first moment of making the decision to two-box, a CDT agent will believe that Box B has a 23 probability of being full.

Besides this being an inaccurate expectation of future wealth, in a slightly different version of Newcomb’s Problem, it leads to potential losses. Suppose you must press one of four buttons \(W, X, Y, Z\) to determine (a) whether to one-box or two-box, and (b) whether to pay an extra $900 fee to make the money (if any) be tax-free. If your marginal tax rate is otherwise 50%, then the payoff chart in after-tax income might look like this:

$$\begin{array}{r|c|c} & \text{One-boxing predicted} & \text{Two-boxing predicted} \\ \hline \text{W: Take both boxes, no fee:} & \$500,500 & \$500 \\ \hline \text{X: Take only Box B, no fee:} & \$500,000 & \$0 \\ \hline \text{Y: Take both boxes, pay fee:} & \$1,000,100 & \$100 \\ \hline \text{Z: Take only Box B, pay fee:} & \$999,100 & -\$900 \end{array}$$

A CDT-agent that has not yet updated on observing its own choice, thinking that it has the 23 prior chance of Box B being full, will press the button Y.

An obvious amendment is to have CDT observe its first impulse, update its background beliefs if required, recalculate expected utilities, possibly change the option selected, and possibly update again, and continue until arriving at a stable state. This closely resembles the tickle defense in that the CDT agent notices the ‘tickle’ of an impulse to choose a particular option, and tries updating on that tickle.

A potential problem with this first amendment is that it can potentially go into infinite loops.

In death in damascus, a man of Damascus sees Death, and Death looks surprised, then remarks that he has an appointment with the man tomorrow. The man immediately purchases a fast horse and rides to Aleppo, where the next day he is killed by falling roof tiles.

The premise of Death in Damascus is that Death, who like Omega is an excellent predictor of human behavior, has already informed you that whichever choice you end up taking was the one that led to the appointed place of your death. If you decide to stay in Damascus, then observing this, you should expect staying in Damascus to be fatal and Aleppo to be less dangerous. If you observe yourself choosing to ride to Aleppo, you should expect that Aleppo kills you while Damascus would be quite safe. Faced with this dilemma, a causal decision theory that repeatedly updates on the ‘tickles’ of its observed decision-impulses will go into an infinite loop.

An obvious second amendment is to allow a CDT agent to use mixed strategies, for example to ‘choose’ to stay in Damascus or go to Aleppo with 0.5 : 0.5 probability. This permits stability in the Death in Damascus case and also some degree of self-observational updating.

However, as Yudkowsky has observed, this twice-amended version of CDT is still subject to predictable losses. At the moment of making the ‘mixed’ decision to stay in Aleppo or go to Damascus with 0.5 : 0.5 probability, the agent reasons as if it has a 50% chance of surviving (by the semantics of the \(\operatorname{do}()\) operation, the counterfactual for the agent’s action cannot, inside that calculation, be correlated with any background variables). So if there was a further-compounded decision which included e.g. a chance to purchase for $1 a ticket that pays out $10 if the agent survives, the agent will buy that ticket (and then try to sell it back immediately afterwards). Similarly, once the CDT agent has started on its way to Aleppo (if that was the result of the randomized decision), nothing prohibits it from suddenly realizing that Aleppo is certainly fatal and Damascus is safe, and trying to turn back. In this sense, the stability and internal consistency of CDT agents might still be regarded as an unsolved problem.


technical details: tickles, infinite loops, mixed strategies motivation and history: newcomb’s problem, critiques, critiques from logical decision theory