# Modal combat

Mo­dal com­bat is a way of study­ing the spe­cial case of the one-shot pris­oner’s dilemma in which the play­ers are pro­grams which have read ac­cess to each other source code. There is a class of such agents, defined by ex­pres­sions in modal logic, which can do gen­eral prov­abil­ity rea­son­ing about them­selves and each other, and fur­ther­more there is a quick way to eval­u­ate how two such agents will be­have against each other.

While they are not al­lowed to run each other’s source code noteOther­wise we risk in­finite re­gres­sion as $$A$$ simu­lates $$B$$ that simu­lates $$A$$, we are go­ing to gen­er­ously give each of them ac­cess to a halt­ing or­a­cle so that they can make pow­er­ful rea­son­ing about each other’s be­hav­ior.

Wait, why is this in­ter­est­ing? Are not your as­sump­tions a bit over­pow­ered? Well, they are. But think­ing about this ma­chines which are ob­vi­ously way more pow­er­ful than our cur­rent ca­pa­bil­ities is a use­ful, albeit un­re­al­is­tic, proxy for su­per­in­tel­li­gence. We should fur­ther­more re­mark that while we are go­ing to stick with halt­ing or­a­cles through this ar­ti­cle there ex­ist bounded ver­sions of some of the re­sults.

## Op­ti­mal­ity in modal combat

We are es­pe­cially in­ter­ested in figur­ing out how would one go about mak­ing ideal com­peti­tors in this setup, be­cause that may shed light on de­ci­sion the­ory is­sues.

For that, we need a good op­ti­mal­ity crite­rion to al­low us to com­pare differ­ent play­ers. Let’s think about some con­di­tions we would like our player to satisfy:

1. Un­ex­ploita­bil­ity: We would like our player to never end in the sucker’s pay­off (we co­op­er­ate and the op­po­nent defects).

2. Ro­bust co­op­er­a­tion: When­ever it is pos­si­ble to achieve mu­tual co­op­er­a­tion, we would like to do so. A par­tic­u­lar way of highly de­sir­able ro­bust co­op­er­a­tion is achiev­ing co­op­er­a­tion with one self. We call this prop­erty of self-trust Löbian co­op­er­a­tion.

3. Ro­bust ex­ploita­tion: When­ever it is pos­si­ble to ex­ploit the op­po­nent (trick­ing him into co­op­er­at­ing when we defect) we wish to do so.

The ques­tion now be­comes: it is pos­si­ble to de­sign a pro­gram which performs op­ti­mally ac­cord­ing to all those met­rics? How close can we get?

## Defec­tBot and CliqueBot

The sim­plest pro­gram that achieves un­ex­ploita­bil­ity is the pro­gram which defects against ev­ery op­po­nent, com­monly known as $$DefectBot$$. This is a rather bor­ing re­sult, which does not man­age to even co­op­er­ate with it­self.

Tak­ing ad­van­tage of the ac­cess to source codes, we could build a pro­gram $$CliqueBot$$ which co­op­er­ates if and only if the op­po­nent’s source code is equal to his own code (wait, how do you do that? With the magic of quin­ing we can always treat our own code as data!). Un­ex­ploitable? Check. Achieves Löbian co­op­er­a­tion? Check. But can’t we do bet­ter? In­nocu­ous changes to $$CliqueBot$$’s source code (such as adding com­ments) re­sult in agents which fail to co­op­er­ate, which is rather lame.

Be­fore we start mak­ing more com­pli­cated pro­gram de­signs we need to de­velop a good rea­son­ing frame­work to figure out how the matches will end. Sim­ple simu­la­tion will not do, be­cause we have given them ac­cess to a halt­ing or­a­cle to make proofs about their op­po­nent’s pro­gram.

For­tu­nately, there is one for­mal sys­tem which nicely han­dles rea­son­ing about prov­abil­ity: cue to Prov­abil­ity logic.

in prov­abil­ity logic we have a modal op­er­a­tor $$\square$$ which Solo­vay’s the­o­rem tells us can be read as “it is prov­able in $$PA$$ that”. Fur­ther­more, this sys­tem of logic is fully de­cid­able, though that comes at the ex­pense of some ex­pres­sive power (we, for ex­am­ple, do not have quan­tifiers in modal logic).

The next step is to ex­press agents in modal com­bat as ex­pres­sions of modal logic.

For that we need to ex­tend prov­abil­ity logic with sec­ond-or­der for­mu­las. A sec­ond-or­der for­mula $$\phi(x)$$ is a fixed point ex­pres­sion in the lan­guage of modal logic in which $$x$$ ap­pears as a for­mula and $$\phi$$ can be used as a vari­able.

For ex­am­ple, we could have the for­mula $$\phi(x) \leftrightarrow \square x(\phi)$$ and the for­mula $$\psi(x)\leftrightarrow \neg\square x(\psi)$$. To eval­u­ate a for­mula with its free vari­able in­stan­ti­ated we sub­sti­tute their vari­ables its fixed point ex­pres­sion its body as many times as re­quired un­til the for­mula with its ar­gu­ment its ex­pressed in terms of it­self, and then we com­pute its fixed point in GL.

For ex­am­ple, $$\psi(\phi)\leftrightarrow \neg\square\phi(\psi)\leftrightarrow \neg\square\square\psi (\phi)$$, and the fixed point can be calcu­lated to be $$GL\vdash \psi(\phi)\leftrightarrow \neg [\square \bot \vee \square\square\bot \wedge \neg\square \bot]$$, which is ar­ith­meti­cally false and thus we say that $$\phi(\psi)$$ eval­u­ates to false.

It is pos­si­ble to refer­ence other for­mu­las in the fixed point ex­pres­sion of a for­mula, though one pre­cau­tion must be taken of lest we fall into cir­cu­lar de­pen­den­cies. We are go­ing to say that a for­mula is of rank $$0$$ if it is defined in terms of it­self and no other for­mula. Then we say that a for­mula can only be defined in terms of it­self and lower rank for­mu­las, and define the rank of a for­mula as the biggest rank in its defi­ni­tion plus one. If the rank is well defined, then the for­mula is also well defined. To guaran­tee the ex­is­tence of a fixed point it will also be con­ve­nient to re­strict our­selves to fully modal­ized for­mu­las.

A modal agent of rank $$k$$ is a one place fully modal­ized modal for­mula of the kind we have just defined of rank $$k$$, whose free vari­able rep­re­sents the op­po­nent’s source code. To simu­late a match, we sub­sti­tute the free vari­able by the op­po­nents for­mula, and eval­u­ate it ac­cord­ing to the pro­ce­dure de­scribed in the pre­vi­ous para­graph. If the fixed point is ar­ith­meti­cally true, then the agent co­op­er­ates, and if it is false it defects. The rank of the agent is the max­i­mum num­ber of nested boxes that oc­cur in the sen­tence.

Thus we have that we can ex­press $$DefectBot$$ as a modal agent with the modal for­mula $$DB(x)\leftrightarrow \bot$$.

## FairBot and PrudentBot

To try to beat CliqueBot’s perfor­mance, we will define a bot which takes ad­van­tage of the halt­ing or­a­cle to achieve ro­bust co­op­er­a­tion with a greater class of agents.

FairBot(x): if Prv(x(FairBot))=C then out­put C else re­turn D


$$FairBot$$ tries to find a proof in $$PA$$ that its op­po­nent is go­ing to co­op­er­ate with him, and if it suc­ceeds then it co­op­er­ates as well.

We can ex­press $$FairBot$$ as a modal agent rep­re­sented by the modal for­mula $$FB(x) \leftrightarrow \square x(FB)$$.

So, does $$FairBot$$ co­op­er­ate with him­self? Let’s find out!

The re­sult of the match against it­self is the same as the fixed point of $$FB(FB)\leftrightarrow \square FB(FB)$$. Ac­cord­ing to the meth­ods for com­put­ing fixed points, the fixed point is thus $$\top$$, which is of course ar­ith­meti­cally truenoteThis can also be figured out us­ing Löb’s the­o­rem. Thus $$FairBot$$ satis­fies the Löbian co­op­er­a­tion con­di­tion!

Fur­ther­more, this holds no mat­ter how $$FairBot$$ is im­ple­mented, so it is way more ro­bust than $$CliqueBot$$!

What hap­pens if $$FairBot$$ faces $$DefectBot$$? Then we have to find the fixed point of $$FB(DB)\leftrightarrow \square \bot$$, which is ob­vi­ously $$\square \bot$$, and since this ex­pres­sion is ar­ith­meti­cally false notePA+1 dis­proves it!, then it is the case that FairBot does not co­op­er­ate with $$DefectBot$$.

$$FairBot$$ is an im­pres­sive for­mal­ism, and it achieves very ro­bust co­op­er­a­tion. How­ever, it fails to ex­ploit other agents. For ex­am­ple, it fails to ex­ploit the sim­ple pro­gram $$CooperateBot$$ note$$CB(x)\leftrightarrow \top$$, which always co­op­er­ates.

An im­prove­ment can be made by con­sid­er­ing the pro­gram rep­re­sented by the rank $$1$$ modal for­mula $$PB(x)\leftrightarrow\square [x(PB)\wedge \neg\square\bot \rightarrow \neg x(DB)]$$, which we call $$PrudentBot$$. By its defi­ni­tion, $$PrudentBot$$ co­op­er­ates with an­other modal agent if and only if it can prove that the op­po­nent will co­op­er­ate with him, and fur­ther­more $$PA+1$$ proves that its op­po­nent will not co­op­er­ate with $$DefectBot$$.

Thus $$PrudentBot$$ co­op­er­ates against it­self and with $$FairBot$$, but defects against $$DefectBot$$ and $$CooperateBot$$. In fact, $$PrudentBot$$ dom­i­nates $$FairBot$$.

## Can we do bet­ter?

One ques­tion that comes to mind: is it pos­si­ble to de­sign a modal agent which achieves co­op­er­a­tion with ev­ery agent for which there is at least an in­put which will make it co­op­er­ate with­out be­ing ex­ploitable? Sadly no. Con­sider the bot Trol­lBot\$ defined by $$TB(x)\leftrightarrow \square x(DB)$$noteEx­er­cise for the reader: does $$TrollBot$$ co­op­er­ate with him­self?. That is, $$TrollBot$$ co­op­er­ates with you if and only if you co­op­er­ate with $$DefectBot$$.

Then $$CooperateBot$$ achieves co­op­er­a­tion with $$TrollBot$$, but no bot co­op­er­ates with $$TrollBot$$ while defect­ing against $$DefectBot$$.

In gen­eral, there is no good op­ti­mal­ity crite­ria de­vel­oped. An open ques­tion is to for­mal­ize a good no­tion of op­ti­mal­ity in the class of modal agents and de­sign a bot which achieves it.

Parents:

• At this point, I think the ev­i­dence points away from there be­ing any deeply use­ful form of op­ti­mal­ity for agents in modal com­bat. The things you can do with prov­abil­ity logic are more re­stric­tive, it turns out, than the things you can do with log­i­cal in­duc­tors, and for that rea­son most of my cur­rent think­ing about de­ci­sion the­ory is in that con­text.

Ac­cord­ingly, I now think of modal com­bat as be­ing a way of study­ing and illus­trat­ing a few ba­sic con­cepts in de­ci­sion the­ory for ad­vanced agents, but not an es­sen­tial part of MIRI’s cur­rent re­search di­rec­tion.