An introductory guide to modern logic

Wel­come! We are about to start a jour­ney through mod­ern logic, in which we will visit two cen­tral re­sults of this branch of math­e­mat­ics, which are of­ten mi­s­un­der­stood and some­times even mis­quoted: Löb’s the­o­rem and Gödel’s sec­ond in­com­plete­ness the­o­rem.

Modern logic can be said to have been born in the ideas of Gödel re­gard­ing the amaz­ing prop­erty of self-refer­ence that plagues ar­ith­metic. Badly ex­plained, this means that we can force an in­ter­pre­ta­tion over nat­u­ral num­bers which re­lates how de­duc­tions are made in ar­ith­metic to state­ments about ar­ith­meti­cal con­cepts such as di­visi­bil­ity.

This guide is tar­gets peo­ple who are already com­fortable with math­e­mat­i­cal think­ing and math­e­mat­i­cal no­ta­tion. It re­quires how­ever no spe­cific back­ground knowl­edge about logic or math to be read. The guide starts lightweight and gets pro­gres­sively more tech­ni­cal.

For­mal proofs

What is logic any­way? A short ex­pla­na­tion refers to the fact that we hu­mans hap­pen to be able to draw con­clu­sions about things we have not ex­pe­rienced by us­ing facts about how the world is. For ex­am­ple, if you see some­body en­ter­ing the room and is drip­ping wa­ter, then you in­fer that it is rain­ing out­side, even though you have not seen the rain your­self.

This may seem triv­ial, but what if we are try­ing to pro­gram that into a com­puter? How can we cap­ture this in­tu­ition of what rea­son­ing is and write down the rules for proper rea­son­ing in a way pre­cise enough to be turned into a pro­gram?

To ac­com­plish that is to do logic. The ul­ti­mate goal is to have an effi­cient pro­ce­dure that if fol­lowed will al­low to de­duce ev­ery true con­se­quence of a set of premises, so sim­ple that it could be taught to a dumb ma­chine which only fol­lows me­chan­i­cal op­er­a­tions.

So let’s try to for­mal­ize rea­son­ing!

We are go­ing to draw in­spira­tion from math­e­mat­i­ci­ans and see what they are do­ing when prov­ing a re­sult.

Though the method is of­ten con­voluted, non lin­ear and hard to un­der­stand, in essence what they are do­ing is writ­ing down some hy­poth­e­sis and facts that they as­sume to be true. Then they ap­ply some ma­nipu­la­tion rules which they as­sume that when ap­plied on true premises al­low to de­rive true facts. After iter­at­ing this pro­cess, they ar­rive to the wanted re­sult.

Get­ting formal

Well, let’s wrap to­gether this in­tu­itive pro­cess in a for­mal defi­ni­tion.

A proof of a sen­tence \(\phi\) is a se­quence of well formed sen­tences such that ev­ery sen­tence in the se­quence is ei­ther an ax­iom or can be de­rived of the pre­vi­ous sen­tences us­ing a deriva­tion rule, and the fi­nal sen­tence in the se­quence is \(\phi\). A sen­tence which has a proof is called a the­o­rem.

Don’t be scared by all the new terms! Let’s go briefly over them.

A well formed sen­tence is just a com­bi­na­tion of let­ters which satis­fies a cer­tain crite­ria which makes them easy to in­ter­pret. The sen­tences we will be deal­ing with are log­i­cal for­mu­las, which use a par­tic­u­lar set of sym­bols such as \(=, \wedge, \implies\). Our goal is to as­sign truth val­ues to sen­tences. That is, to say if par­tic­u­lar sen­tences are true or false.

An ax­iom is some­thing that we as­sume to be true with­out re­quiring a proof. For ex­am­ple, a com­mon ax­iom in ar­ith­metic is as­sum­ing that \(0\) is differ­ent than \(n+1\) for all nat­u­ral num­bers \(n\). Or ex­pressed as a well formed sen­tence, \(\forall n. 0 \not = n+1\). noteThe sym­bol \(\forall\) is an up­side-down A and means “for all”.

A deriva­tion rule is a valid ba­sic step in a proof. Per­haps the sim­plest ex­am­ple is modus po­nens . Mo­dus po­nens tells you that if you have de­duced some­how that \(A\implies B\) and also \(A\), then you can de­duce from both facts \(B\).

Those three com­po­nents con­sti­tute the ba­sis of a log­i­cal sys­tem for rea­son­ing: a crite­ria for con­struct­ing well formed sen­tences, a set of ax­ioms which we know to be true noteThe set of ax­ioms does not have to be finite! In par­tic­u­lar, we can spec­ify a crite­ria to iden­tify in­finitely many ax­ioms. This is called an ax­iom schema and some deriva­tion rules which al­low us to de­duce new con­clu­sions.

For a rather silly ex­am­ple, let’s con­sider the sys­tem \(A\) with the fol­low­ing com­po­nents:

  • Well formed sen­tences: the set of words in the Web­ster dic­tio­nary.

  • Ax­ioms: The word ‘break’.

  • De­duc­tion rules: If a word \(w\) is a the­o­rem of \(A\), then so it is a word which only differs in one let­ter of \(w\), ei­ther by adding, sub­stract­ing or re­plac­ing it any­where in \(w\).

Thus the word ‘trend’ is a the­o­rem of \(A\), as shown by the 4 steps long proof ‘break—bread—tread—trend’. The first sen­tence in the proof is an ax­iom. Every sen­tence af­ter is de­duced from our de­duc­tion rule us­ing the pre­vi­ous sen­tence as a premise. <div><div>


Given how we defined a log­i­cal sys­tem, there is a cer­tain de­gree of free­dom we have, as we can choose the un­der­ly­ing sets of well formed sen­tences, the ax­ioms and the de­duc­tion rules. How­ever, as shown in the ex­am­ple above, if we want to make mean­ingful de­duc­tions we should be care­ful in what com­po­nents do we choose for our log­i­cal sys­tem.

The crite­ria we use to choose these com­po­nents is in­ter­twined with the model we are try­ing to rea­son about. Models can be any­thing that cap­tures an as­pect of the world, or just a math­e­mat­i­cal con­cept. To put it sim­ply, they are the thing we are try­ing to rea­son about.

Depend­ing on what kind of en­tities the model has, we will lean to­wards a par­tic­u­lar choice of sen­tences. For ex­am­ple, if our model con­sists in just facts which are sub­ject to some known log­i­cal re­la­tions we could use the lan­guage of propo­si­tional logic. But if our model con­tains differ­ent ob­jects which are re­lated in differ­ent ways, we will pre­fer to use the lan­guage of first or­der logic.

Each model has an in­ter­pre­ta­tion as­so­ci­ated, which re­lates the terms in the sen­tences to as­pects of the model. For all prac­ti­cal pur­poses, a model and its in­ter­pre­ta­tion are the same once we have fixed the lan­guage, and thus we will use the terms as syn­onyms.

Then, de­pend­ing on how the model evolves and what things can we in­fer from what facts we should choose some de­duc­tion rules. To­gether, the sen­tences and the de­duc­tion rules spec­ify a class of pos­si­ble mod­els. If the de­duc­tion rules cor­rectly cap­ture the prop­er­ties of the class of mod­els we had in mind, in the sense that from true premises we re­ally de­rive true con­se­quences, then we will say that they are sound.

In­tu­itively, we say that the choice of lan­guage and de­duc­tion rules de­cides the “shape” that our mod­els will have. Every model with such a shape will be part of the uni­verse of mod­els speci­fied by those com­po­nents of a log­i­cal sys­tem.

With this imagery of a class of mod­els, we can think of ax­ioms as an at­tempt to re­duce the class of mod­els and pin down the con­crete model we are in­ter­ested in, by choos­ing as ax­ioms sen­tences whose in­ter­pre­ta­tion is true in only the model we are in­ter­ested in and in no other model note­sadly, there are oc­ca­sions in which this is just not pos­si­ble.

A prop­erty tightly re­lated to sound­ness is com­plete­ness. A sys­tem is com­plete if ev­ery sen­tence which is true un­der all in­ter­pre­ta­tions which satisfy our ax­ioms has a proof.

It is im­por­tant to re­al­ize that log­i­cal sys­tems can be talk­ing about many mod­els at once. In par­tic­u­lar, if two in­ter­pre­ta­tions satisfy the ax­ioms but dis­agree on the truth of a par­tic­u­lar sen­tence, then that sen­tence will be un­de­cid­able in our model.

This sug­gest a tech­nique for prov­ing in­de­pen­dence of log­i­cal state­ments. To show that a cer­tain sen­tence is in­de­pen­dent from some ax­ioms, con­struct two mod­els which satisfy those ax­ioms but con­tra­dict each other on the value of the state­ment. A nice ex­am­ple of this is how Lobachevsky showed that Eu­clides Fifth pos­tu­late is in­de­pen­dent from the other four pos­tu­lates.

add more ex­am­ples everywhere

In­tro­duc­ing PA

For our pur­poses, we will stick to a par­tic­u­lar log­i­cal sys­tem called Peano ar­ith­metic. This par­tic­u­lar choice of ax­ioms and de­duc­tion rules is in­ter­est­ing be­cause it re­flects a lot of our in­tu­itions about how num­bers work, which in turn can be used to talk about many phe­nom­ena in the real world. In par­tic­u­lar, they can talk about them­selves.

Be­fore we move on to this “talk­ing about them­selves” busi­ness I am go­ing to in­tro­duce more no­ta­tion. We will re­fer to Peano Arith­metic as \(PA\). If a sen­tence \(\phi\) fol­lows from the ax­ioms of \(PA\) and its de­duc­tion rules noteie, there is a proof of \(\phi\) us­ing only ax­ioms and de­duc­tion rules from \(PA\), we will say that \(PA\vdash \phi\), read as ”\(PA\) proves \(\phi\)”.

Self refer­ence and the prov­abil­ity predicate

Let’s try to get an in­tu­ition of what I mean when I say that num­bers can talk about them­selves.

The first key in­tu­ition is that we can re­fer to ar­bi­trary se­quences of char­ac­ters us­ing num­bers. For ex­am­ple, I can de­clare that from now on­wards the num­ber \(1\) is go­ing to re­fer to the se­quence of let­ters “I love Peano Arith­metic”. Fur­ther­more, us­ing a clever rule to re­late num­bers and sen­tences I can make sure that ev­ery pos­si­ble finite sen­tence gets as­signed a num­ber noteThis is called en­cod­ing.

A sim­ple en­cod­ing goes by the name of Gödel en­cod­ing, and con­sists of as­sign­ing to ev­ery sym­bol we are go­ing to al­low to be used in sen­tences a num­ber. For ex­am­ple, \(=\) could be \(1\), and \(a\) could be \(0\), and so on and so forth. Then we could en­code a sen­tence con­sist­ing of \(n\) sym­bols as the num­ber \(2^{a_1}3^{a_2}5^{a_3}\cdots p(n)^{a_n}\). That is, we take the num­ber which is the product of the \(n\)th first primes with ex­po­nents equal to the as­signed num­bers.noteThis is pos­si­ble be­cause ev­ery num­ber can be de­com­posed as a unique product of primes.

The pro­cess can be re­peated to en­code se­quences of sen­tences as sin­gle num­bers. Thus, we can en­code whole proofs as sin­gle num­bers. We could also en­code se­quences of se­quences of sen­tences, but that is go­ing too far for our pur­poses.

whole ex­am­ple of Gödel en­cod­ing, define care­fully no­ta­tion of godelquotes

So the point is, we can ex­change sen­tences by num­bers and vice versa. Can we also talk about de­duc­tion us­ing num­bers?

It turns out we can! With the en­cod­ing we have cho­sen it is cum­ber­some to show, but we can write a pred­i­cate \(Axiom(x)\) noteA pred­i­cate is a well formed sen­tence in which we have left one or more “holes” in the form of vari­ables which we can sub­sti­tute for literal num­bers or quan­tify over. For ex­am­ple, we can have the pred­i­cate \(IsEqualTo42(x)\) of the form \(x = 42\). Then \(PA\vdash IsEqualTo42(42)\) and \(PA\vdash \exists x IsEqualTo42(x)\), but \(PA\not\vdash IsEqualTo42(7)\) in the lan­guage of Peano Arith­metic such that \(PA\vdash Axiom(\textbf{n})\) if and only if \(n\) is a num­ber which en­codes an ax­iom of \(PA\).

Fur­ther­more, de­duc­tion rules which re­quire \(n\) premises can be rep­re­sented by \(n+1\) pred­i­cates \(Rule(p_1, p_2,..., p_n, r)\) which is prov­able in \(PA\) if and only if \(p_1, ...., p_n\) are num­bers en­cod­ing valid premises for the rule and \(r\) is the cor­re­spond­ing de­duced fact.

A bit more of work and we can put to­gether a pred­i­cate \(Proof(x,y)\), which is prov­able in \(PA\) if and only if \(x\) en­codes the valid proof of the sen­tence \(y\). Isn’t that neat!?

Since we are not too in­ter­ested in the proof it­self, and more in the fact that there is a proof at all, we are go­ing to con­struct the prov­abil­ity pred­i­cate \(\exists x. Proof(x,y)\), which we will call \(\square_{PA}(y)\).note\(\exists\) is a back­wards E and means “there ex­ists”. So then, \(\square_{PA}(x)\) liter­ally means “There is a proof in \(PA\) of \(x\)”.

Thus we can say that if the num­ber \(\ulcorner 1+1=2 \urcorner\) cor­re­sponds to the sen­tence \(1+1=2\)noteThis no­ta­tion re­mains in effect for the rest of the ar­ti­cle., then \(PA\vdash \square_{PA}(\ulcorner 1+1=2 \urcorner)\). That is, \(PA\) proves that there is a proof in \(PA\) of \(1+1=2\).

Now, the pred­i­cates we have seen so far are quite in­tu­itive and nicely be­haved, in the sense that their de­ducibil­ity from \(PA\) matches quite well what we would ex­pect from our in­tu­ition. How­ever, adding the ex­is­ten­tial quan­tifier in front of \(Proof(x,y)\) we get some nasty side effects.

Warn­ing! Really bad anal­ogy in­com­ing!

The thing is that \(PA\) “hal­lu­ci­nates” num­bers which it is not sure whether they ex­ist or not. Those are no mere nat­u­ral num­bers, but in­finite num­bers fur­ther in the hori­zon of math. While \(PA\) can­not prove their ex­is­tence, it nei­ther can­not prove its non ex­is­tence. So \(PA\) be­comes wary of as­sert­ing that proofs do not ex­ist for a cer­tain false sen­tence. After all, one of those non stan­dard num­bers may en­code the proof of the false sen­tence! Who is to prove oth­er­wise?

Can we patch this some­how? Maybe by adding more ax­ioms or de­duc­tion rules to \(PA\) so that it can prove that those num­bers do not ex­ists? The an­swer is yes but not re­ally. While it is doable in prin­ci­ple, the re­sult­ing the­ory be­comes too difficult to man­age, and we can no longer use it for effec­tive de­duc­tion noteTech­ni­cally, \(PA\) loses its semide­cid­abil­ity that way..

We will now pro­ceed to prove two tech­ni­cal re­sults which bet­ter for­mal­ize this idea: Löb’s the­o­rem and Gödel’s Se­cond In­com­plete­ness The­o­rem.

Löb’s theorem

Löb’s re­sult fol­lows from the in­tu­itive prop­er­ties of de­duc­tion that the prov­abil­ity pred­i­cate ac­tu­ally man­ages to cap­ture. This points to the fact that it is not our defi­ni­tion what is wrong, but rather to a fun­da­men­tal im­pos­si­bil­ity in logic.

The in­tu­itive, good prop­er­ties of \(\square_{PA}\) are known as the Hilbert-Ber­nais deriv­abil­ity con­di­tions, and are as fol­lows:

  1. If \(PA\vdash A\), then \(PA\vdash \square_{PA}(\ulcorner A\urcorner)\).

  2. \(PA\vdash \square_{PA}(\ulcorner A\rightarrow B\urcorner) \rightarrow [\square_{PA}(\ulcorner A \urcorner)\rightarrow \square_{PA}(\ulcorner B \urcorner)]\)

  3. \(PA\vdash \square_{PA}(\ulcorner A\urcorner) \rightarrow \square_{PA} \square_{PA} (\ulcorner A\urcorner)\).

Let’s go over each of them in turn.

1) says rea­son­ably that if \(PA\) proves a sen­tence \(A\), then it also proves that there is a proof of \(A\).

2) af­firms that if you can prove that \(A\) im­plies \(B\) then the ex­is­tence of a proof of \(A\) im­plies the ex­is­tence of a proof of \(B\). This is quite in­tu­itive, as we can con­cate­nate a proof of \(A\) with a proof of \(A\rightarrow B\) and de­duce \(B\) from an ap­pli­ca­tion of modus po­nens.

3) is a tech­ni­cal re­sult which states that the for­mal­iza­tion of 1) is prov­able when we are deal­ing with sen­tences of the form \(\square_{PA}(\ulcorner A \urcorner)\).

One more in­gre­di­ent is needed to de­rive Löb’s: the di­ag­o­nal lemma, which states that for all pred­i­cates \(\phi(x)\) there is a for­mula \(\psi\) such that \(PA\vdash \psi \leftrightarrow \phi(\ulcorner \psi \urcorner)\).

The de­tails of the proof can be found in Proof of Löb’s the­o­rem.

The de­tails are not es­sen­tial to the main idea, but it can be illus­tra­tive to work through the for­mal proof. Plus the in­tu­ition be­hind re­lated to the non-stan­dard num­bers we talked about be­fore.

What is re­ally in­ter­est­ing is that now we are in po­si­tion to enun­ci­ate and un­der­stand Löb’s the­o­rem!

Löb’s theorem

If \(PA\vdash \square_{PA}(\ulcorner A\urcorner) \rightarrow A\), then \(PA\vdash A\).

(or equiv­a­lently, if \(PA\not\vdash A\) then \(PA\not\vdash \square_{PA}(\ulcorner A\urcorner) \rightarrow A\)).

Talk about an un­in­tu­itive re­sult! Let’s take a mo­ment to pon­der about its mean­ing.

In­tu­itively, we should be able to de­rive from the ex­is­tence of a proof of \(A\) that \(A\) is true. After all, proofs are guaran­tees that some­thing re­ally fol­lows from the ax­ioms, so if we got those right and our deriva­tion rules are cor­rect then \(A\) should be true. How­ever, \(PA\) does not trust that be­ing able to find the ex­is­tence of a proof is enough to make \(A\) true!

In­deed, he needs to be able to see by him­self the proof. In other words, there must be a \(n\) such that \(PA\vdash Proof(\textbf n, \ulcorner A\urcorner)\). Then he will trust that it is in­deed the case that \(A\).

I will re­peat that again be­cause it looks like a tongue twister. Sup­pose some­body came and as­sured \(PA\) that from the ax­ioms of \(PA\) it fol­lows that there ex­ists a num­ber \(n\) satis­fy­ing \(Proof(\textbf n,\ulcorner A\urcorner)\). Then \(PA\) would say, show me the proof or I am go­ing to as­sume that that \(n\) is ac­tu­ally a non stan­dard num­ber and you a frig­gin’ liar.

If some­body just comes say­ing he has a proof, but pro­duces none, \(PA\) be­comes sus­pi­cious. After all, the proof in ques­tion could be a non-stan­dard proof en­coded by one of the dreaded non stan­dard num­bers! Who is go­ing to trust that!

Gödel II

And fi­nally we ar­rive to Gödel’s Se­cond In­com­plete­ness The­o­rem, per­haps the most widely mi­s­un­der­stood the­o­rem of all math­e­mat­ics.

We first need to in­tro­duce the no­tion of con­sis­tency. Sim­ply enough, a log­i­cal sys­tem is con­sis­tent if it does not prove a con­tra­dic­tion, where a con­tra­dic­tion is some­thing im­pos­si­ble. For ex­am­ple, it can­not ever be the case that \(P\wedge \neg P\), so \(P\wedge \neg P\) is a con­tra­dic­tion no mat­ter what \(P\) is. We use the sym­bol \(\bot\) to rep­re­sent a con­tra­dic­tion.

The state­ment of GII is as fol­lows:

Gödel Se­cond In­com­plete­ness The­o­rem (con­crete form) If \(PA\) is con­sis­tent, then \(PA\not \vdash \neg \square_{PA}(\bot)\)

No­tice that GII fol­lows quite di­rectly from Löb’s the­o­rem. Ac­tu­ally, it is the case that Löb’s the­o­rem also fol­lows from GII, so both re­sults are equiv­a­lent.

This re­sult can be in­ter­preted as fol­lows: you can­not make a sys­tem as com­plex as \(PA\) in which you can talk about de­duc­tion with com­plete cer­tainty, and thus about con­sis­tency. In par­tic­u­lar, such a sys­tem can­not prove that he him­self is con­sis­tent.

The re­sult is startling, but in the light of our pre­vi­ous ex­po­si­tion is clear what is go­ing on! There is always the shadow of non stan­dard num­bers men­ac­ing our de­duc­tions.


And that con­cludes our in­tro­duc­tion to for­mal logic!

To re­call some im­por­tant things we have learned:

  • Log­i­cal sys­tems cap­ture the in­tu­ition be­hind de­duc­tive rea­son­ing. They are com­posed of ax­ioms and de­duc­tive rules that are chained to com­pose proofs.

  • Sim­ple log­i­cal sys­tems that are used to talk about num­bers, such as \(PA\), can be in­ter­preted as talk­ing about many things through en­cod­ings, and in par­tic­u­lar they can talk about them­selves.

  • There are ex­pres­sions in logic that cap­ture the con­cepts in­volved in de­duc­tions. How­ever, the most im­por­tant of them, the prov­abil­ity pred­i­cate \(\square_{PA}\), fails to satisfy some in­tu­itive prop­er­ties due to the in­abil­ity of \(PA\) to prove that non stan­dard num­bers do not ex­ists.

  • Löb’s the­o­rem says that if \(PA\) can­not prove \(A\), then it can nei­ther prove that from \(\square_{PA}(\ulcorner A\urcorner\) fol­lows \(A\).

  • \(PA\) can­not prove its own con­sis­tency, in the sense that it can­not prove that the stan­dard pred­i­cate is never satis­fied by a con­tra­dic­tion.

If you want to get deeper in the rab­bit hole, read about model the­ory and se­man­tics or modal logic.


  • Mathematics

    Math­e­mat­ics is the study of num­bers and other ideal ob­jects that can be de­scribed by ax­ioms.