Modal combat

Mo­dal com­bat is a way of study­ing the spe­cial case of the one-shot pris­oner’s dilemma in which the play­ers are pro­grams which have read ac­cess to each other source code. There is a class of such agents, defined by ex­pres­sions in modal logic, which can do gen­eral prov­abil­ity rea­son­ing about them­selves and each other, and fur­ther­more there is a quick way to eval­u­ate how two such agents will be­have against each other.

While they are not al­lowed to run each other’s source code noteOther­wise we risk in­finite re­gres­sion as \(A\) simu­lates \(B\) that simu­lates \(A\), we are go­ing to gen­er­ously give each of them ac­cess to a halt­ing or­a­cle so that they can make pow­er­ful rea­son­ing about each other’s be­hav­ior.

Wait, why is this in­ter­est­ing? Are not your as­sump­tions a bit over­pow­ered? Well, they are. But think­ing about this ma­chines which are ob­vi­ously way more pow­er­ful than our cur­rent ca­pa­bil­ities is a use­ful, albeit un­re­al­is­tic, proxy for su­per­in­tel­li­gence. We should fur­ther­more re­mark that while we are go­ing to stick with halt­ing or­a­cles through this ar­ti­cle there ex­ist bounded ver­sions of some of the re­sults.

Op­ti­mal­ity in modal combat

We are es­pe­cially in­ter­ested in figur­ing out how would one go about mak­ing ideal com­peti­tors in this setup, be­cause that may shed light on de­ci­sion the­ory is­sues.

For that, we need a good op­ti­mal­ity crite­rion to al­low us to com­pare differ­ent play­ers. Let’s think about some con­di­tions we would like our player to satisfy:

  1. Un­ex­ploita­bil­ity: We would like our player to never end in the sucker’s pay­off (we co­op­er­ate and the op­po­nent defects).

  2. Ro­bust co­op­er­a­tion: When­ever it is pos­si­ble to achieve mu­tual co­op­er­a­tion, we would like to do so. A par­tic­u­lar way of highly de­sir­able ro­bust co­op­er­a­tion is achiev­ing co­op­er­a­tion with one self. We call this prop­erty of self-trust Löbian co­op­er­a­tion.

  3. Ro­bust ex­ploita­tion: When­ever it is pos­si­ble to ex­ploit the op­po­nent (trick­ing him into co­op­er­at­ing when we defect) we wish to do so.

The ques­tion now be­comes: it is pos­si­ble to de­sign a pro­gram which performs op­ti­mally ac­cord­ing to all those met­rics? How close can we get?

Defec­tBot and CliqueBot

The sim­plest pro­gram that achieves un­ex­ploita­bil­ity is the pro­gram which defects against ev­ery op­po­nent, com­monly known as \(DefectBot\). This is a rather bor­ing re­sult, which does not man­age to even co­op­er­ate with it­self.

Tak­ing ad­van­tage of the ac­cess to source codes, we could build a pro­gram \(CliqueBot\) which co­op­er­ates if and only if the op­po­nent’s source code is equal to his own code (wait, how do you do that? With the magic of quin­ing we can always treat our own code as data!). Un­ex­ploitable? Check. Achieves Löbian co­op­er­a­tion? Check. But can’t we do bet­ter? In­nocu­ous changes to \(CliqueBot\)’s source code (such as adding com­ments) re­sult in agents which fail to co­op­er­ate, which is rather lame.

Be­fore we start mak­ing more com­pli­cated pro­gram de­signs we need to de­velop a good rea­son­ing frame­work to figure out how the matches will end. Sim­ple simu­la­tion will not do, be­cause we have given them ac­cess to a halt­ing or­a­cle to make proofs about their op­po­nent’s pro­gram.

For­tu­nately, there is one for­mal sys­tem which nicely han­dles rea­son­ing about prov­abil­ity: cue to Prov­abil­ity logic.

in prov­abil­ity logic we have a modal op­er­a­tor \(\square\) which Solo­vay’s the­o­rem tells us can be read as “it is prov­able in \(PA\) that”. Fur­ther­more, this sys­tem of logic is fully de­cid­able, though that comes at the ex­pense of some ex­pres­sive power (we, for ex­am­ple, do not have quan­tifiers in modal logic).

The next step is to ex­press agents in modal com­bat as ex­pres­sions of modal logic.

For that we need to ex­tend prov­abil­ity logic with sec­ond-or­der for­mu­las. A sec­ond-or­der for­mula \(\phi(x)\) is a fixed point ex­pres­sion in the lan­guage of modal logic in which \(x\) ap­pears as a for­mula and \(\phi\) can be used as a vari­able.

For ex­am­ple, we could have the for­mula \(\phi(x) \leftrightarrow \square x(\phi)\) and the for­mula \(\psi(x)\leftrightarrow \neg\square x(\psi)\). To eval­u­ate a for­mula with its free vari­able in­stan­ti­ated we sub­sti­tute their vari­ables its fixed point ex­pres­sion its body as many times as re­quired un­til the for­mula with its ar­gu­ment its ex­pressed in terms of it­self, and then we com­pute its fixed point in GL.

For ex­am­ple, \(\psi(\phi)\leftrightarrow \neg\square\phi(\psi)\leftrightarrow \neg\square\square\psi (\phi)\), and the fixed point can be calcu­lated to be \(GL\vdash \psi(\phi)\leftrightarrow \neg [\square \bot \vee \square\square\bot \wedge \neg\square \bot] \), which is ar­ith­meti­cally false and thus we say that \(\phi(\psi)\) eval­u­ates to false.

It is pos­si­ble to refer­ence other for­mu­las in the fixed point ex­pres­sion of a for­mula, though one pre­cau­tion must be taken of lest we fall into cir­cu­lar de­pen­den­cies. We are go­ing to say that a for­mula is of rank \(0\) if it is defined in terms of it­self and no other for­mula. Then we say that a for­mula can only be defined in terms of it­self and lower rank for­mu­las, and define the rank of a for­mula as the biggest rank in its defi­ni­tion plus one. If the rank is well defined, then the for­mula is also well defined. To guaran­tee the ex­is­tence of a fixed point it will also be con­ve­nient to re­strict our­selves to fully modal­ized for­mu­las.

A modal agent of rank \(k\) is a one place fully modal­ized modal for­mula of the kind we have just defined of rank \(k\), whose free vari­able rep­re­sents the op­po­nent’s source code. To simu­late a match, we sub­sti­tute the free vari­able by the op­po­nents for­mula, and eval­u­ate it ac­cord­ing to the pro­ce­dure de­scribed in the pre­vi­ous para­graph. If the fixed point is ar­ith­meti­cally true, then the agent co­op­er­ates, and if it is false it defects. The rank of the agent is the max­i­mum num­ber of nested boxes that oc­cur in the sen­tence.

Thus we have that we can ex­press \(DefectBot\) as a modal agent with the modal for­mula \(DB(x)\leftrightarrow \bot\).

FairBot and PrudentBot

To try to beat CliqueBot’s perfor­mance, we will define a bot which takes ad­van­tage of the halt­ing or­a­cle to achieve ro­bust co­op­er­a­tion with a greater class of agents.

FairBot(x): if Prv(x(FairBot))=C then out­put C else re­turn D

\(FairBot\) tries to find a proof in \(PA\) that its op­po­nent is go­ing to co­op­er­ate with him, and if it suc­ceeds then it co­op­er­ates as well.

We can ex­press \(FairBot\) as a modal agent rep­re­sented by the modal for­mula \(FB(x) \leftrightarrow \square x(FB)\).

So, does \(FairBot\) co­op­er­ate with him­self? Let’s find out!

The re­sult of the match against it­self is the same as the fixed point of \(FB(FB)\leftrightarrow \square FB(FB)\). Ac­cord­ing to the meth­ods for com­put­ing fixed points, the fixed point is thus \(\top\), which is of course ar­ith­meti­cally truenoteThis can also be figured out us­ing Löb’s the­o­rem. Thus \(FairBot\) satis­fies the Löbian co­op­er­a­tion con­di­tion!

Fur­ther­more, this holds no mat­ter how \(FairBot\) is im­ple­mented, so it is way more ro­bust than \(CliqueBot\)!

What hap­pens if \(FairBot\) faces \(DefectBot\)? Then we have to find the fixed point of \(FB(DB)\leftrightarrow \square \bot\), which is ob­vi­ously \(\square \bot\), and since this ex­pres­sion is ar­ith­meti­cally false notePA+1 dis­proves it!, then it is the case that FairBot does not co­op­er­ate with \(DefectBot\).

\(FairBot\) is an im­pres­sive for­mal­ism, and it achieves very ro­bust co­op­er­a­tion. How­ever, it fails to ex­ploit other agents. For ex­am­ple, it fails to ex­ploit the sim­ple pro­gram \(CooperateBot\) note\(CB(x)\leftrightarrow \top\), which always co­op­er­ates.

An im­prove­ment can be made by con­sid­er­ing the pro­gram rep­re­sented by the rank \(1\) modal for­mula \(PB(x)\leftrightarrow\square [x(PB)\wedge \neg\square\bot \rightarrow \neg x(DB)]\), which we call \(PrudentBot\). By its defi­ni­tion, \(PrudentBot\) co­op­er­ates with an­other modal agent if and only if it can prove that the op­po­nent will co­op­er­ate with him, and fur­ther­more \(PA+1\) proves that its op­po­nent will not co­op­er­ate with \(DefectBot\).

Thus \(PrudentBot\) co­op­er­ates against it­self and with \(FairBot\), but defects against \(DefectBot\) and \(CooperateBot\). In fact, \(PrudentBot\) dom­i­nates \(FairBot\).

Can we do bet­ter?

One ques­tion that comes to mind: is it pos­si­ble to de­sign a modal agent which achieves co­op­er­a­tion with ev­ery agent for which there is at least an in­put which will make it co­op­er­ate with­out be­ing ex­ploitable? Sadly no. Con­sider the bot Trol­lBot$ defined by \(TB(x)\leftrightarrow \square x(DB)\)noteEx­er­cise for the reader: does \(TrollBot\) co­op­er­ate with him­self?. That is, \(TrollBot\) co­op­er­ates with you if and only if you co­op­er­ate with \(DefectBot\).

Then \(CooperateBot\) achieves co­op­er­a­tion with \(TrollBot\), but no bot co­op­er­ates with \(TrollBot\) while defect­ing against \(DefectBot\).

In gen­eral, there is no good op­ti­mal­ity crite­ria de­vel­oped. An open ques­tion is to for­mal­ize a good no­tion of op­ti­mal­ity in the class of modal agents and de­sign a bot which achieves it.