Proof of Rice's theorem

Re­call the for­mal state­ment of Rice’s the­o­rem:

We will use the no­ta­tion \([n]\) for the \(n\)th Tur­ing ma­chine un­der some fixed num­ber­ing sys­tem. Each such ma­chine in­duces a func­tion, which we will also write as \([n]\) where this is un­am­bigu­ous due to con­text; then it makes sense to write \([n](m)\) for the value that ma­chine \([n]\) out­puts when it is run on in­put \(m\).

Let \(A\) be a non-empty, proper noteThat is, it is not the en­tire set. sub­set of \(\{ \mathrm{Graph}(n) : n \in \mathbb{N} \}\), where \(\mathrm{Graph}(n)\) is the graph of the par­tial func­tion com­puted by \([n]\), the \(n\)th Tur­ing ma­chine. Then there is no Tur­ing ma­chine \([r]\) such that:

  • \([r](i)\) is \(1\) if \(\mathrm{Graph}(i) \in A\)

  • \([r](i)\) is \(0\) if \(\mathrm{Graph}(i) \not \in A\).

We give a proof that is (very nearly) con­struc­tive: one which (if we could be both­ered to work it all through) gives us an ex­plicit ex­am­ple noteWell, very nearly; see the next note. of a Tur­ing ma­chine whose “am-I-in-\(A\)” na­ture can­not be de­ter­mined by a Tur­ing ma­chine. noteIt’s only “very nearly” con­struc­tive. It would be ac­tu­ally con­struc­tive if we knew in ad­vance a spe­cific ex­am­ple of a pro­gram whose func­tion is in \(A\), and a pro­gram whose func­tion is in \(B\). The proof here as­sumes the ex­is­tence of a pro­gram of each type, but iron­i­cally the the­o­rem it­self guaran­tees that there is no fully-gen­eral way to find such pro­grams.

We will pre­sent an in­ter­me­di­ate lemma which does all the heavy lift­ing; this makes the ac­tual rea­son­ing rather un­clear but very suc­cinct, so we will also in­clude an ex­ten­sive worked ex­am­ple of what this lemma does for us.

Fixed point theorem

The in­ter­me­di­ate lemma is a cer­tain fixed-point the­o­rem.

Let \(h: \mathbb{N} \to \mathbb{N}\) be to­tal com­putable: that is, it halts on ev­ery in­put. Then there is \(n \in \mathbb{N}\) such that \(\mathrm{Graph}(n) = \mathrm{Graph}(h(n))\). noteAnd, more­over, we can ac­tu­ally find such an \(n\).

That is, the “un­der­ly­ing func­tion” of \(n\) - the par­tial func­tion com­puted by \([n]\) - has the same out­put, at ev­ery point, as the func­tion com­puted by \([h(n)]\). If we view \(h\) as a way of ma­nipu­lat­ing a pro­gram (as speci­fied by its de­scrip­tion num­ber), then this fixed-point the­o­rem states that we can find a pro­gram whose un­der­ly­ing func­tion is not changed at all by \(h\).

The proof of this lemma is quite sim­ple once the magic steps have been dis­cov­ered, but it is dev­il­ishly difficult to in­tuit, be­cause it in­volves two rather strange and con­fus­ing re­cur­sions and some self-refer­ence.

Re­call the \(s_{mn}\) the­o­rem, which states that there is a to­tal com­putable func­tion \(S\) of two vari­ables \(m, n\) such that for ev­ery \(e \in \mathbb{N}\), we have \([e](m, n) = [S(e,m)](n)\): that is, there is a to­tal com­putable way \(S\) of cur­ry­ing com­putable func­tions. (Strictly speak­ing, our Tur­ing ma­chines only take one ar­gu­ment. There­fore we should use a com­putable pairing scheme such as Can­tor’s pairing func­tion, so that ac­tu­ally \([e](m,n)\) should be in­ter­preted as \([e](\mathrm{pair}(m, n))\).)

Then the func­tion which takes the pair \((e, x)\) and out­puts the value of \([ h(S(e,e)) ](x)\) is com­putable, so it has a de­scrip­tion num­ber \(a\), say. noteThis is the first strange part: we are treat­ing \(e\) both as a de­scrip­tion num­ber, and as an in­put to \([e]\), when we con­sider \(S(e,e)\).

Now we claim that \(S(a, a)\) is the \(n\) we seek. noteThis is the sec­ond strange part, for the same rea­son as \(S(e,e)\) was the first; but this one is even worse, be­cause the defi­ni­tion of \(a\) already in­volves a weird re­cur­sion and we’ve just added an­other one on top. In­deed, for any \(x\), \([n](x) = [S(a,a)](x)\) by defi­ni­tion of \(n\); this is \([a](a, x)\) by the \(s_{mn}\) the­o­rem; this is \([h(S(a,a))](x)\) by defi­ni­tion of \([a]\); and that is \([h(n)](x)\) by defi­ni­tion of \(n\).

There­fore \([n](x) = [h(n)](x)\), so we have found our fixed point.

I’m not clever enough to check this prop­erly, but some­one should

Sup­pose our de­scrip­tion num­ber­ing scheme is just “ex­pand \(n\) as a num­ber in base \(128\), and in­ter­pret the re­sult as an ASCII <div><div>note:This is a stan­dard, agreed-upon method of turn­ing a num­ber be­tween \(0\) and \(128\) into a char­ac­ter.%% string; then in­ter­pret that string as Python code”.

Then our func­tion \(h\), what­ever it may be, can be viewed just as trans­form­ing Python code.

Sup­pose \(h\) does noth­ing more than in­sert the fol­low­ing line of code as the sec­ond line of its in­put:

x = 0

So, for in­stance, it takes the string

x = 1
print(x)

and returns

x = 1
x = 0
print(x)

thereby chang­ing the func­tion com­puted from “re­turn the con­stant \(1\)” to “re­turn the con­stant \(0\)”, in this case. Note that many other func­tions will not change at all: for ex­am­ple, those which don’t con­tain a vari­able \(x\) in the first place will be un­changed, be­cause all the mod­ifi­ca­tion does is add in an ini­tial­i­sa­tion of a vari­able which will never sub­se­quently be used.

The fixed-point the­o­rem guaran­tees that there is in­deed a Python pro­gram which will not change at all un­der this mod­ifi­ca­tion (though in this case it’s very ob­vi­ous). In fact the the­o­rem con­structs such a pro­gram; can we work out what it is?

First of all, \(S(m, n)\) can be im­ple­mented as fol­lows. We will take our Python code to be writ­ten so that their in­put is given in the vari­able r1, so \([e](5)\) is sim­ply the Python code rep­re­sented by \(e\) but where the code-vari­able r1 is ini­tial­ised to \(5\) first; that is, it can be found by prepend­ing the line r1 = 5 to the code rep­re­sented by \(e\). Then we will as­sume that Python comes with a func­tion eval (cor­re­spond­ing to \(S\)) which takes as its in­put a string noteThe string is stand­ing in place of \(m\), but we have just skipped the in­ter­me­di­ate step of “un­pack the in­te­ger into a string” and gone straight to as­sum­ing it is a string. and an­other ar­gu­ment with which eval ini­tial­ises the vari­able x be­fore run­ning the string as a Python pro­gram in a sep­a­rate in­stance of Python:

eval(“print(r1)”, 5)  # does noth­ing other than print the num­ber 5
eval(“print(y)”, 5)  # throws an er­ror be­cause `y` is not defined when it comes to print­ing it
eval(“print(6)”, 5)  # prints 6, ig­nor­ing the fact that the vari­able `r1` is equal to `5` in the sub-in­stance

Re­mem­ber, our proof of the fixed point the­o­rem says that the pro­gram we want has code \(S(a, a)\), where \(a\) takes a pair \((e, x)\) as in­put, and out­puts \([h(S(e,e))](x)\). What is \(a\) speci­fi­cally here? Well, on the one hand we’re view­ing it as a string of code (be­cause it comes as the first ar­gu­ment to \(S\)), and on the other we’re view­ing it as an in­te­ger (be­cause it also comes as the sec­ond ar­gu­ment to \(S\)).

As code, a is the fol­low­ing string, where h is to be re­placed by what­ever we’ve already de­cided \(h\) is:

eval(“r1 = e; h(eval(r1, str_as_int(r1)))”, x)

We are as­sum­ing the ex­is­tence of a func­tion str_as_int which takes an ASCII string and re­turns the in­te­ger whose places in base 128 are the ASCII for each char­ac­ter of the string in turn.

For ex­am­ple, we have \(h\) in­sert­ing the line x = 0 as the sec­ond line, so our a is:

eval(“r1 = e; x = 0; eval(r1, str_as_int(r1))”, x)

As a num­ber, a is just the ASCII for this, in­ter­preted in base 128 (i.e. a cer­tain num­ber which in this case hap­pens to have 106 digits, which is why we don’t give it here).

The claim of the fixed-point the­o­rem, then, is that the fol­low­ing pro­gram is un­changed by \(h\):

eval(“eval(\”r1 = e; x = 0; eval(r1, str_as_int(r1))\”, x)”, str_to_int(“eval(\”r1 = e; x = 0; eval(r1, str_as_int(r1))\”, x)”))

You may recog­nise this as a quin­ing con­struc­tion. %%

De­duc­ing Rice’s the­o­rem from the fixed point theorem

Fi­nally, Rice’s the­o­rem fol­lows quickly: sup­pose we could de­cide in gen­eral whether \(\mathrm{Graph}(n) \in A\) or not, and la­bel by \(\iota\) the com­putable func­tion which de­cides this (that is, whose value is \(1\) if \(\mathrm{Graph}(n) \in A\), and \(0\) oth­er­wise).

Since \(A\) is nonempty and proper, there are nat­u­ral num­bers \(a\) and \(b\) such that \(\mathrm{Graph}(a) \in A\) but \(\mathrm{Graph}(b) \not \in A\). Define the com­putable func­tion \(g\) which takes \(n\) and out­puts \(a\) if \(\iota(n) = 0\), and \(b\) oth­er­wise. (That is, it flips its in­put: if its in­put had the prop­erty of \(A\), the func­tion \(g\) out­puts \(b\) whose graph is not in \(A\), and vice versa. In­for­mally, it is the pro­gram-trans­former that reads in a pro­gram, de­ter­mines whether the pro­gram com­putes a func­tion in \(A\) or not, and trans­forms the pro­gram into a spe­cific canon­i­cal ex­am­ple of some­thing which has the op­po­site \(A\)-ness sta­tus.)

By the fixed-point the­o­rem, we can find \(n\) such that \(\mathrm{Graph}(n) = \mathrm{Graph}(g(n))\).

But now we can ask whether \(\mathrm{Graph}(n)\) is in \(A\) (and there­fore whether \(\mathrm{Graph}(g(n))\) is in \(A\)).

  • If it is in \(A\), then \(g(n) = b\) and so \(\mathrm{Graph}(g(n)) = \mathrm{Graph}(b)\) which is not in \(A\).

  • If it is not in \(A\), then \(g(n) = a\) and so \(\mathrm{Graph}(g(n)) = \mathrm{Graph}(a)\) is in \(A\).

We have ob­tained con­tra­dic­tions in both cases (namely that \(\mathrm{Graph}(g(n))\) is both in \(A\) and not in \(A\)), so it must be the case that \(\iota\) does not ex­ist af­ter all.

Parents:

  • Rice's Theorem

    Rice’s The­o­rem tells us that if we want to de­ter­mine pretty much any­thing about the be­havi­our of an ar­bi­trary com­puter pro­gram, we can’t in gen­eral do bet­ter than just run­ning it.

    • Turing machine

      A Tur­ing Ma­chine is a sim­ple math­e­mat­i­cal model of com­pu­ta­tion that is pow­er­ful enough to de­scribe any com­pu­ta­tion a com­puter can do.