You are stranded in the desert, running out of water, and soon to die. Someone in a motor vehicle drives up to you. The driver of the motor vehicle is a selfish ideally game-theoretical agent, and what’s more, so are you. Furthermore, the driver is Paul Ekman who has spent his whole life studying facial microexpressions and is extremely good at reading people’s honesty by looking at their faces.
The driver says, “Well, as an ideal selfish rational agent, I’ll convey you into town if it’s in my own interest to do so. I don’t want to bother dragging you to Small Claims Court if you don’t pay up. So I’ll just ask you this question: Can you honestly say that you’ll give me $1,000 from an ATM after we reach town?”
On some decision theories, an ideal selfish rational agent will realize that once it reaches town, it will have no further incentive to pay the driver. Thus, agents of this type answer “Yes,” whereupon the driver says “You’re lying” and drives off leaving them to die.
Would you survive? noteOkay, fine, you’d just keep your promise because of being honest. But would you still survive even if you were an ideal selfish agent running whatever algorithm you consider to correspond to the ideal ?
Parfit’s Hitchhiker is noteworthy in that, unlike the alien philosopher-troll Omega running strange experiments, Parfit’s driver acts for understandable reasons.
The Newcomblike aspect of the problem arises from the way that your algorithm’s output, once inside the city, determines both:
Whether you actually pay up in the city;
Your helpless knowledge of whether you’ll actually pay up in the city, which you can’t stop from being visible in your facial microexpressions.
We may assume that Parfit’s driver also asks you questions like “Have you really thought through what you’ll do?” and “Are you trying to think one thing now, knowing that you’ll probably think something else in the city?” and watches your facial expression on those answers as well.
Note that quantitative changes in your probability of survival may be worth pursuing, even if you don’t think it’s certain that Paul Ekman could read off your facial expressions correctly. Indeed, just a driver who is fairly good at reading faces might motivate this as an important Newcomblike problem, if you value significant probability shifts in your survival at more than $1,000.
Parfit’s Hitchhiker is structurally similar to the Transparent Newcomb’s Problem, if you value your life at $1,000,000.
Dies in the desert. A CDT agent knows that its future self will reason, “Now that I’m in the city, nothing I do can physically cause me to be back in the desert again” and will therefore refuse to pay. Therefore, the present agent is unable to answer honestly that it will pay in the future.
Dies in the desert. An EDT agent knows that its future self will reason, “Since I can already see that I’m in the city, my paying $1,000 wouldn’t provide me with any further good news about my being in the city.”
• A updateless feature, will reason, “If-counterfactually my algorithm for what to do in the city had the logical output ‘refuse to pay’, then in that counterfactual case I would have died in the desert”. The TDT agent will therefore evaluate the expected utility of refusing to pay as very low., even without the
• An updateless decision agent computes that the optimal policy maps the sense data “I can see that I’m already in the city” to the action “Pay the driver $1,000″ and this computation does not change after the agent sees that it is in the city.
- Newcomblike decision problems
Decision problems in which your choice correlates with something other than its physical consequences (say, because somebody has predicted you very well) can do weird things to some decision theories.