Absent-Minded Driver dilemma

A road contains two-identical looking intersections. An absent-minded driver wants to exit at the second intersection, but can’t remember if they’ve passed the first intersection already.

The utility of exiting at the first intersection is $0, the utility of exiting at the second intersection is $4, and the utility of continuing straight past both intersections is $1. noteUtility functions describe the relative desirability intervals between outcomes. So this payoff matrix says that the added inconvenience of “going past both intersections” compared to “turning right at 2nd” is 14 of the added inconvenience of “turning right at 1st” compared to “turning right at 2nd”. Perhaps turning right at 1st involves a much longer detour by the time the driver realizes their mistake, or a traffic-jammed stoplight to get back on the road.

With what probability should the driver continue vs. exit at a generic-looking intersection, in order to maximize their expected utility?

Analyses

From the standpoint of Newcomblike problems, the Absent-Minded Driver is noteworthy because the logical correlation of the two decisions arises just from the agent’s imperfect memory (anterograde amnesia or limited storage space). There is no outside Omega making predictions about the agent; any problem that the agent encounters is strictly of its own making.

Intuitive/​pretheoretic

The driver doesn’t know each time whether they’re at the first or second intersection, so will continue with the same probability \(p\) at each intersection. The expected payoff of adopting \(p\) as a policy is the sum of:

  • $0 times the probability \(1 - p\) of exiting at 1st;

  • $4 times a \(p\) probability of continuing past first multiplied by a \(1 - p\) probability of exiting at the second intersection;

  • $1 times a \(p^2\) probability of continuing past both intersections.

To find the maximum of the function \(0(1-p) + 4(1-p)p + 1p^2\) we set the [derivative](http://​​www.wolframalpha.com/​​input/​​?i=d%2Fdp+%5B0(1-p)+%2B+4(1-p)p+%2B+1p%5E2%5D) \(4 -6p\) equal to 0 [yielding](http://​​www.wolframalpha.com/​​input/​​?i=maximize+%5B0(1-p)+%2B+4(1-p)p+%2B+1p%5E2%5D) \(p = \frac{2}{3}\).

So the driver should continue with 23 probability and exit with 13 probability at each intersection, [yielding](http://​​www.wolframalpha.com/​​input/​​?i=p%3D2%2F3,+%5B0(1-p)+%2B+4(1-p)p+%2B+1p%5E2%5D) an expected payoff of \(\$0\cdot\frac{1}{3} + \$4\cdot\frac{2}{3}\frac{1}{3} + \$1\cdot\frac{2}{3}\frac{2}{3} = \$\frac{4}{3} \approx \$1.33.\)

Causal decision theory

The analysis of this problem under causal decision theory has traditionally been considered difficult; e.g., Volume 20 of the journal Games and Economic Behavior was devoted entirely to the Absent-Minded Driver game.

Suppose that before you set out on your journey, you intended to adopt a policy of continuing with probability 23. Then when you actually encounter an intersection, you believe you are at the second intersection with probability 35. (There is a 100% or 33 chance of encountering the first intersection, and a 23 chance of encountering the second intersection. So the odds are 3 : 2 for being in the first intersection versus the second intersection.)

Now since you are not a logical decision theorist, you believe that if you happen to already be at the second intersection, you can change your policy \(p\) without retroactively affecting the probability that you’re already at the second intersection—either you’re already at the second intersection or not, after all!

The first analysis of this problem was given by Piccione and Rubinstein (1997):

Suppose we start out believing we are continuing with probability \(q.\) Then our odds of being at the first vs. second intersection would be \(1 : q,\) so the probability of being at each intersection would be \(\frac{1}{1+q}\) and \(\frac{q}{1+q}\) respectively.

If we’re at the first intersection and we choose a policy \(p,\) we should expect a future payoff of \(4p(1-p) + 1p^2.\) If we’re already at the second intersection, we should expect a policy \(p\)’s future payoff to be \(4(1-p) + 1p.\)

In total our expected payoff is then \(\frac{1}{1+q}(4p(1-p) + p^2) + \frac{q}{1+q}(4(1-p) + p)\) whose [derivative](http://​​www.wolframalpha.com/​​input/​​?i=d%2Fdp+%5B4p(1-p)+%2B+p%5E2+%2B+q(4(1-p)+%2B+p%29%5D%2F(q%2B1)) \(\frac{-6p - 3q + 4}{q+1}\) equals 0 at \(p=\frac{4-3q}{6}.\)

Our decision at \(q\) will be stable only if the resulting maximum of \(p\) is equal to \(q,\) and this is true when \(p=q=\frac{4}{9}.\) The expected payoff from this policy is \(\$4\cdot\frac{4}{9}\frac{5}{9} + \$1\cdot\frac{4}{9}\frac{4}{9} \approx \$1.19.\)

However, the immediately following paper by Robert Aumann et. al. (1997) offered an alternative analysis in which, starting out believing our policy to be \(q\), if we are at the first intersection, then our decision \(p\) also cannot affect our decision \(q\) that will be made at the second intersection. noteFrom an LDT perspective, at least the CDT agent is being consistent about ignoring logical correlations! So:

  • If we had in fact implemented the policy \(q,\) our odds for being at the first vs. second intersection would be \(1 : q \cong \frac{1}{1+q} : \frac{q}{1+q}\) respectively.

  • If we’re at the first intersection, then the payoff of choosing a policy \(p,\) given that our future self will go on implementing \(q\) regardless, is \(4p(1-q) + 1pq.\)

  • If we’re already at the second intersection, then the payoff of continuing with probability \(p\) is \(4(1-p) + 1p.\)

So if our policy is \(q,\) the expected payoff of the policy \(p\) under CDT is:

$$\frac{1}{1+q}(4p(1-q) + pq) + \frac{q}{1+q}(4(1-p) + p)$$

Differentiating with respect to p yields \(\frac{4 - 6q}{1+q}\) which has no dependency on \(p.\) This makes a kind of sense, since if your decision now has no impact on your past or future decision at the other intersection, most settings of \(q\) will just yield an answer of “definitely turn right” or “definitely turn left”. However, there is a setting of \(q\) which makes any policy \(p\) seem equally desirable, the point at which \(4-6q = 0 \implies q=\frac{2}{3}.\) Aumann et al. take this to imply that a CDT agent should output a \(p\) of 23.

One might ask how this result of 23 is actually rendered into an output, since on the analysis of Aumann et. al., if your policy \(q\) in the past or future is to continue with 23 probability, then any policy \(p\) seems to have equal utility. However, outputting \(p\)=2/​3 would also correspond to the general procedure proposed to resolve e.g. Death in Damascus within CDT. Allegedly, it is just a general rule of the principle of rational choice that in this type of problem one should find a policy where, assuming one implements that policy, all policies look equally good, and then do that.

Further analyses have, e.g., remarked on the analogy to the Sleeping Beauty Problem and delved into anthropics; or considered the problem as a game between two different agents occupying each intersection, etcetera. It is considered nice to arrive at an answer of 23 at the end, but this is not mandatory.

Logical decision theory

Logical decision theorists using, e.g., the updateless form of timeless decision theory, will compute an answer of 23 using the same procedure and computation as in the intuitive/​pretheoretic version. They will also remark that it is strange to imagine that the reasonable answer could be different from the optimal policy, or even that they should require a different reasoning path to compute; and will note that while simplicity is not the only virtue of a theory of instrumental rationality, it is a virtue.

Parents:

  • Newcomblike decision problems

    Decision problems in which your choice correlates with something other than its physical consequences (say, because somebody has predicted you very well) can do weird things to some decision theories.