Lambda calculus

What’s the least we can put in a pro­gram­ming lan­guage and still have it be able to do any­thing other lan­guages can notei.e. tur­ing com­plete? It turns out we can get away with in­clud­ing only a sin­gle built in fea­ture: func­tion defi­ni­tion, writ­ten us­ing $$\lambda$$. The lan­guage that has only this one fea­ture is called “lambda calcu­lus” or ”$$\lambda$$ calcu­lus.” We also need a way to ap­ply our func­tions to in­puts; we write $$f\ x$$ to in­di­cate the value of the func­tion $$f$$ on $$x$$, per­haps more fa­mil­iarly writ­ten $$f(x)$$.

A $$\lambda$$ ex­pres­sion is some­thing of the form $$\lambda x.f(x)$$, which means “take in an in­put $$x$$, and re­turn $$f(x)$$. For ex­am­ple, $$\lambda x.x+1$$ takes in a num­ber and re­turns one more than it. We only di­rectly have func­tions with one in­put, but we can build func­tions with mul­ti­ple in­puts out of this. For ex­am­ple, ap­ply­ing $$\lambda x.\lambda y.x+y$$ to $$3$$, and ap­ply­ing the re­sult to $$4$$ gives $$3+4=7$$ . We can think of this as a two-in­put func­tion, and ab­bre­vi­ate it as $$\lambda xy.x+y$$; “$$\lambda xy$$” is sim­ply a short­hand for ”$$\lambda x.\lambda y$$.” We don’t lose any power by re­strict­ing our­selves to unary func­tions, be­cause we can think of func­tions with mul­ti­ple in­puts as tak­ing only one in­put at a time, and re­turn­ing an­other func­tion at each in­ter­me­di­ate step.

We said the only fea­ture we have is func­tion defi­ni­tion, so what’s this about “$$+$$” and “$$1$$”? Th­ese sym­bols don’t ex­ist in $$\lambda$$ calcu­lus, so we’ll have to define num­bers and op­er­a­tions on them us­ing $$\lambda$$ noteThe stan­dard way of do­ing this is called the Church en­cod­ing.. But $$\lambda$$ ex­pres­sions are use­ful even when we have other fea­tures as well; pro­gram­ming lan­guages such as Python al­low func­tion defi­ni­tion us­ing $$\lambda$$. It’s also use­ful when try­ing to un­der­stand what $$\lambda$$ means to see ex­am­ples with func­tions you’re more fa­mil­iar with.

We’re not go­ing to put con­straints on what in­puts $$\lambda$$ ex­pres­sions can take. If we al­lowed ex­pres­sions like $$\lambda x.x+1$$, it would get con­fused when we try to give it some­thing other than a num­ber; this is why we can’t have $$+$$ and $$1$$. So what sorts of ob­jects will we give our func­tions? The only ob­jects have we: other func­tions, of course!

For­mal definition

Let’s for­mally define $$\lambda$$ calcu­lus. First, the lan­guage of $$\lambda$$ calcu­lus contains

• Paren­the­ses ( and )

• Sym­bols $$\lambda$$ and .

• Vari­ables $$v_1,v_2,\dots$$ (in prac­tice we’ll use other let­ters too).

What counts as a $$\lambda$$ ex­pres­sion? We define them re­cur­sively:

• For any vari­able $$x$$, $$x$$ is a $$\lambda$$ ex­pres­sion.

• For any vari­able $$x$$ and $$\lambda$$ ex­pres­sion $$M$$, $$(\lambda x.M)$$ is a $$\lambda$$ ex­pres­sion.

• If $$M$$ and $$N$$ are $$\lambda$$ ex­pres­sions, so is $$(M\ N)$$.

We also want to define free and bound vari­ables, and also do this re­cur­sively:

• The vari­able $$x$$ is free in the ex­pres­sion $$x$$.

• Every in­stance of $$x$$ is bound in the ex­pres­sion $$(\lambda x.M)$$. If an in­stance of $$x$$ is free in $$M$$, we say the ini­tial $$\lambda x$$ in $$(\lambda x.M)$$ binds it. We say that $$M$$ is the scope of that $$\lambda x$$.

• An in­stance of $$x$$ in $$(M\ N)$$ is free if it is free in $$M$$ or $$N$$, and bound oth­er­wise.

We can use the rules to build ex­pres­sions like $$((\lambda x.(\lambda y.(x\ y))))\ x)$$, in which the first in­stance of $$x$$ and the only in­stance of $$y$$ are bound, and the sec­ond in­stance of $$x$$ is free. We have an op­pres­sive amount of paren­the­ses; we’ll es­tab­lish in­for­mal con­ven­tions to re­duce them in the next sec­tion.

Paren­the­sis conventions

If we want to be ex­tremely care­ful, we should write the add-two-num­bers func­tion as $$(\lambda x.(\lambda y.(x+y)))$$, us­ing paren­the­ses as in the for­mal defi­ni­tion. But we of­ten don’t need them, and use the fol­low­ing rules:

• We don’t need out­er­most paren­the­ses. We can write $$\lambda x.(\lambda y.(x+y))$$ in­stead of $$(\lambda x.(\lambda y.(x+y)))$$.

• Ap­pli­ca­tion is, by de­fault, left as­so­ci­a­tive.$$f\ x\ y$$ means “the func­tion $$f$$ with in­puts $$x$$ and $$y$$,” i.e. $$(f\ x)\ y$$, not $$f\ (x\ y)$$.

• The scope of any $$\lambda$$ is as large is pos­si­ble.$$\lambda x.\lambda y.x+y$$ means $$\lambda x.(\lambda y.(x+y))$$, not $$(\lambda x.\lambda y.x)+y$$ or $$\lambda x.(\lambda y.x)+y$$ or any­thing like that.

• We can ab­bre­vi­ate se­quences of $$\lambda$$s. As men­tioned be­fore, we can write $$\lambda xy.x+y$$ for $$\lambda x.\lambda y.x+y$$.

Of course, we don’t have to omit paren­the­ses when­ever we can. Th­ese rules gov­ern what it means when we don’t have all the paren­the­ses strictly nec­es­sary, but we can in­clude them if we want to be ex­tra clear.

Re­duc­ing lambda expressions

So far, $$\lambda$$ ex­pres­sions don’t have any mean­ing; we’ve said what we’re al­lowed to write down but not how the re­late to each other. What we usu­ally do with a $$\lambda$$ ex­pres­sion is re­duce it, and there are three rules that we can use: $$\beta$$-re­duc­tion, $$\alpha$$-con­ver­sion, and $$\eta$$-con­ver­sion.

$$\beta$$-re­duc­tion is the fancy name for “stick in the in­puts to a func­tion.” For ex­am­ple, to eval­u­ate $$(\lambda x.\lambda y.x+y)\ 6\ 3$$, we first no­tice that $$6$$ is the in­put to a $$\lambda$$ ex­pres­sion that starts $$\lambda x$$, and sub­sti­tute $$6$$ for $$x$$ in the ex­pres­sion, giv­ing us $$(\lambda y.6+y)\ 3$$. Now $$3$$ is sub­sti­tuted for $$y$$, giv­ing us $$6+3=9$$.

For­mally, for a vari­able $$x$$ and $$\lambda$$ ex­pres­sions $$M$$ and $$N$$, $$\beta$$-re­duc­tion con­verts $$((\lambda x.M)\ N)$$ to $$M[N/x]$$, i.e.$$M$$ with ev­ery free in­stance of $$x$$ re­placed with $$N$$. The ex­pres­sions $$((\lambda x.M)\ N)$$ and $$M[N/x]$$ are equiv­a­lent.

todo: is there a more stan­dard no­ta­tion for this?

$$\alpha$$-con­ver­sion says we can re­name vari­ables; $$\lambda x.f\ x$$ is the same as $$\lambda y.f\ y$$. For­mally, if $$M$$ is a $$\lambda$$ ex­pres­sion con­tain­ing $$\lambda x$$, and no in­stance of $$x$$ bound by that $$\lambda x$$ is within the scope of a $$\lambda y$$, then we can re­place ev­ery $$x$$ in the scope of the $$\lambda x$$ with a $$y$$ to get an equiv­a­lent ex­pres­sion. We need the sec­ond crite­rion be­cause oth­er­wise we could re­place the $$x$$s in $$\lambda x.\lambda y.x$$ with $$y$$s to get $$\lambda y.\lambda y.y$$, and then re­place the outer $$y$$ with $$x$$ to get $$\lambda x.\lambda y.x$$, which is not equiv­a­lent (the sec­ond sub­sti­tu­tion here is valid; the first one is the prob­lem).

Fi­nally, $$\eta$$-con­ver­sion says that if two ex­pres­sions give the same re­sult on any in­put, then they’re equiv­a­lent. We should have $$\lambda x.f\ x$$ and $$f$$ be the same, since $$\beta$$-re­duc­tion con­verts $$(\lambda x.f\ x)\ x$$ to $$f\ x$$, so they’re the same on any in­put. For­mally, if $$x$$ isn’t free in $$M$$, then $$(\lambda x.(M\ x))$$ is equiv­a­lent to $$M$$.

Currying

Let’s look again at the func­tion that adds two num­bers, $$\lambda x.\lambda y.x+y$$. There are two differ­ent ways we can think of this: first, as a func­tion that takes in two num­bers, and re­turns their sum. If we’re talk­ing about nat­u­ral num­bers, this is a func­tion $$\mathbb N^2\to\mathbb N$$.

We could also think of the func­tion as tak­ing one in­put at a time. It says “take in in­put $$x$$, re­turn the ex­pres­sion $$\lambda y.x+y$$.” For ex­am­ple, on in­put $$6$$ we have $$(\lambda x.\lambda y.x+y)\ 6=\lambda y.6+y$$. Now we have a func­tion that takes in a num­ber, and gives us a func­tion that takes in a num­ber and re­turns a num­ber. We can write this as $$\mathbb N\to(\mathbb N\to\mathbb N)$$.

This equiv­alence be­tween func­tions on two (or more) in­puts, and func­tions on one in­put that re­turn a new func­tion, is known as Cur­ry­ing and is im­por­tant in $$\lambda$$ calcu­lus, where the only ob­jects are func­tions. We have to get used to the idea that a func­tion can take other func­tions as ar­gu­ments and give func­tions as re­turn val­ues.

Defin­ing variables

$$\lambda$$ calcu­lus doesn’t have a built in way to define a vari­able. Sup­pose we have the func­tion $$d=\lambda f.\lambda x.f\ (f\ x)$$ which ap­plies its first in­put to its sec­ond in­put twice. Cur­ry­ing, $$d$$ takes in a func­tion and re­turns that func­tion com­posed with it­self. Say we want to know what $$d\ d$$ is. We could just write $$d$$ twice, to get $$(\lambda f.\lambda x.f\ (f\ x))\ (\lambda f.\lambda x.f\ (f\ x))$$. $$\beta$$-re­duc­tion even­tu­ally re­duces this to $$\lambda f.\lambda x.f\ (f\ (f\ (f\ x)))$$, the func­tion that ap­plies its first in­put to its sec­ond in­put four times (as you might have ex­pected, ap­ply­ing a func­tion twice, twice, is the same as ap­ply­ing it four times).

We can also get $$d\ d$$ while only writ­ing out $$d$$ once. Take the $$\lambda$$ ex­pres­sion $$(\lambda x.x\ x)\ d$$ or, writ­ing out $$d$$, $$(\lambda x.x\ x)\ (\lambda f.\lambda x.f\ (f\ x))$$. A sin­gle step of $$\beta$$-re­duc­tion turns this into the pre­vi­ous ver­sion, where we wrote $$d$$ out twice. The ex­pres­sion we started with was shorter, and ex­pressed bet­ter the idea that we wanted to use the same value of $$d$$ twice. In gen­eral, if we want to say some­thing like $$x = a; f\ x$$ (i.e. define the vari­able $$x$$, and then use it in some ex­pres­sion), we can write $$(\lambda x.f\ x)\ a$$, first build­ing the ex­pres­sion we want to use the value of $$x$$ in, and then sub­sti­tut­ing in the value us­ing func­tion ap­pli­ca­tion. This will be very use­ful in the next sec­tion when we make re­cur­sive func­tions.

Loops

It might seem like $$\lambda$$ ex­pres­sions are very finite, and $$\beta$$-re­duc­tion will always finish af­ter a rea­son­able amount of time. But for $$\lambda$$ calcu­lus to be as pow­er­ful as nor­mal pro­gram­ming lan­guages, there must be $$\lambda$$ ex­pres­sions that don’t re­duce in a finite amount of time. The way to re­solve this is to figure out how to build while loops in $$\lambda$$ calcu­lus; if you can make a while loop, you can write a pro­gram that runs for­ever pretty eas­ily.

Since all we have is func­tions, we’ll use re­cur­sion to make loopy pro­grams. For ex­am­ple, con­sider this fac­to­rial func­tion in Python:

fac­to­rial = lambda n: 1 if n==0 else n*fac­to­rial(n-1)


How can we write this in lambda calcu­lus? Let’s as­sume we have num­bers, ar­ith­metic, booleans,

todo: pos­si­bly change these links to church en­cod­ing pages once they exist
and a func­tion $$Z$$ that re­turns True on $$0$$ and False on any­thing else. Let’s also a as­sume we have an “if” func­tion $$I$$ that takes in three in­puts: $$I = (\lambda p.\lambda x.\lambda y.$$if $$p$$, then $$x$$, else $$y)$$. So $$I\ True\ x\ y=x$$ and $$I\ False\ x\ y=y$$.

Start­ing to trans­late from Python to $$\lambda$$ calcu­lus, we have $$F = \lambda n.I\ (Z\ n)\ 1\ (n\times(F\ (n-1)))$$, which says “if $$n$$ is $$0$$, then $$1$$, else $$n\times F(n-1)$$. But we can’t define $$F$$ like this; we have to use the method de­scribed in the pre­vi­ous sec­tion. In or­der for each re­cur­sive call of the fac­to­rial func­tion to know how to call fur­ther re­cur­sive calls, we need to pass in $$F$$ to it­self. Con­sider the expression

$$F=(\lambda x.x\ x)\ (\lambda r.\lambda n.I\ (Z\ n)\ 1\ (n\times(r\ r\ (n-1))))$$

Let $$g=\lambda r.\lambda n.I\ (Z\ n)\ 1\ (n\times(r\ r\ (n-1)))$$. Then $$F=(\lambda x.x\ x)\ g=g\ g$$ is $$g$$ ap­plied to it­self. Plug­ging in $$g$$ for $$r$$, $$g\ g=\lambda n.I\ (Z\ n)\ 1\ (n\times(g\ g\ (n-1)))$$. But since $$g\ g=F$$, this is just $$\lambda n.I\ (Z\ n)\ 1\ (n\times(F\ (n-1)))$$, ex­actly what we wanted $$F$$ to be.

Let’s gen­er­al­ize this. Say we want to define a re­cur­sive for­mula $$f=\lambda x.h\ f\ x$$, where $$h$$ is any ex­pres­sion (in the fac­to­rial ex­am­ple $$h=\lambda f.\lambda n.I\ (Z\ n)\ 1\ (n\times(f\ (n-1)))$$). We define

\be­gin{al­ign} g&=\lambda r.h\ (r\ r)\newl­ine &=\lambda r.\lambda x.h\ (r\ r)\ x \end{al­ign}

and use this to build

\be­gin{al­ign} f&=(\lambda x.x\ x)\ g\newl­ine &=g\ g\newl­ine &=(\lambda r.\lambda x.h\ (r\ r)\ x)\ g\newl­ine &=\lambda x.h\ (g\ g)\ x\newl­ine &=\lambda x.h\ f\ x. \end{al­ign}

We can ex­press this pro­cess tak­ing any $$h$$ to the re­cur­sive func­tion $$f$$ as a $$\lambda$$ ex­pres­sion:

\be­gin{al­ign} Y&=\lambda h.(\lambda x.x\ x)\ (\lambda r.h\ (r\ r))\newl­ine &=\lambda h.(\lambda r.h\ (r\ r))\ (\lambda r.h\ (r\ r)). \end{al­ign}

For any $$h$$, which takes as ar­gu­ments $$f$$ and $$x$$ and re­turns some ex­pres­sion us­ing them, $$Y\ h$$ re­cur­sively calls $$h$$. Speci­fi­cally, let­ting $$f=Y\ h$$, we have $$f=\lambda x.h\ f\ x$$.

For fun, let’s make a $$\lambda$$ ex­pres­sion that en­ters an in­finite loop by re­duc­ing to it­self. If we make $$h$$ the func­tion that ap­plies $$f$$ to $$x$$, re­cur­sively call­ing it should ac­com­plish this, since it’s analo­gous to the Python func­tion f = lambda x: f(x), which just loops for­ever. So let $$h=\lambda f.\lambda x.f x=\lambda f.f$$. Now \be­gin{al­ign} Y\ h&=(\lambda x.x\ x)\ (\lambda r.(\lambda f.f)\ (r\ r))\newl­ine &=(\lambda x.x\ x)\ (\lambda r.r\ r)\newl­ine &=(\lambda r.r\ r)\ (\lambda r.r\ r). \end{al­ign}

Con­tinued $$\beta$$-re­duc­tion doesn’t get us any­where. This ex­pres­sion is also in­ter­est­ing be­cause it’s a sim­ple quine, a pro­gram that out­puts its own code, and this is a sort of tem­plate you can use to make quines in other lan­guages.

This strat­egy of build­ing re­cur­sive func­tions re­sults in some very long in­ter­me­di­ate com­pu­ta­tions. If $$F$$ is the fac­to­rial func­tion, we com­pute $$F(100)$$ as $$100\times F(99)=100\times 99\times F(98)$$, etc. By the time the re­cur­sive calls fi­nally stop, we have the very long ex­pres­sion $$100\times99\times\dots\times2\times1$$, which takes a lot of space to write down or store in a com­puter. Many pro­gram­ming lan­guages will give you an er­ror if you try to do this, com­plain­ing that you ex­ceeded the max re­cur­sion depth.

The fix is some­thing called tail re­cur­sion, which roughly says “re­turn only a func­tion eval­u­a­tion, so you don’t need to re­mem­ber to do some­thing with it when it finishes.” The gen­eral way do ac­com­plish this is to pass along the in­ter­me­di­ate com­pu­ta­tion to the re­cur­sive calls, in­stead of ap­ply­ing it to the re­sult of the re­cur­sive calls. We can make a tail re­cur­sive ver­sion of our Python fac­to­rial like so notePython will still give you a “max­i­mum re­cur­sion depth ex­ceeded” er­ror with this, but the strat­egy works in lambda calcu­lus and lazy lan­guages like Lisp and Haskell, and Python is eas­ier to demon­strate with.:

f = lambda n, k: k if n==0 else f(n-1, k*n)


We pass along $$k$$ as an in­ter­me­di­ate value. It has to start with $$k=1$$; we can do this by defin­ing a new func­tion g = lambda n: f(n,1) or us­ing Python’s de­fault value fea­ture to have $$k$$ be $$1$$ be de­fault. Now if we want $$100$$ fac­to­rial, we get $$f(100,1)=f(99,1*100)=f(99,100)=f(98,9900)=\dots$$. We com­pute the prod­ucts as we go in­stead of all at the end, and only ever have one call of $$f$$ ac­tive at a time.

Writ­ing a tail re­cur­sive ver­sion of fac­to­rial in $$\lambda$$ calcu­lus is left as an ex­er­cise for the reader.

Y combinator

You might be won­der­ing why we called the re­cur­sive-func­tion-maker $$Y$$, so let’s look at that again. We said ear­lier that $$Y\ h=\lambda x.h\ (Y\ h)\ x$$. By $$\eta$$-con­ver­sion, $$Y\ h=h\ (Y\ h)$$. That means $$Y\ h$$ is a fixed point of $$h$$! For any func­tion $$h$$, $$Y$$ finds a value $$f$$ such that $$h\ f=f$$. This kind of ob­ject is called a fixed point com­bi­na­tor; this par­tic­u­lar one is called the Y com­bi­na­tor, con­ven­tion­ally de­noted $$Y$$.

Why is a fixed point fin­der the right way to make re­cur­sive func­tions? Con­sider again the ex­am­ple of fac­to­rial. If a func­tion $$f$$ says “do fac­to­rial for $$k$$ re­cur­sive steps,” then $$h\ f$$ says “do fac­to­rial for $$k+1$$ re­cur­sive steps;” $$h$$ was de­signed to be the func­tion which, call­ing re­cur­sively, gives fac­to­rial. Nam­ing the iden­tity func­tion $$i=\lambda n.n$$, $$i$$ does fac­to­rial for $$0$$ steps, so $$h\ i$$ does it for $$1$$ step, $$h\ (h\ i)$$ for $$2$$, and so on. To com­pletely com­pute fac­to­rial, we need a func­tion $$F$$ that says “do fac­to­rial for in­finity re­cur­sive steps.” But do­ing one more than in­finity is the same as just do­ing in­finity, so it must be true that $$h\ F=F$$; i.e. that $$F$$ is a fixed point of $$h$$. There may be other fixed points of $$h$$, but the Y com­bi­na­tor finds the fixed point that cor­re­sponds to re­cur­sively call­ing it.

With $$Y$$ and with­out $$Y$$

What do we get, and what do we lose, if we al­low the use of $$Y$$?

$$Y$$ takes a lambda-term which may have cor­re­sponded to a prim­i­tive re­cur­sive func­tion, and out­puts a lambda term which need not cor­re­spond to a prim­i­tive re­cur­sive func­tion. That is be­cause it al­lows us to perform an un­bounded search: it ex­presses fully-gen­eral loop­ing strate­gies. (As dis­cussed ear­lier, with­out a fixed-point fin­der of some kind, we can only cre­ate lambda-terms cor­re­spond­ing to pro­grams which prov­ably halt.)

On the other side of the coin, $$Y$$ may pro­duce non-well-typed terms, while terms with­out a fixed-point fin­der are guaran­teed to be well-typed. That means it’s hard or even im­pos­si­ble to perform type-check­ing (one of the most ba­sic forms of static anal­y­sis) on terms which in­voke $$Y$$.

Strongly re­lat­edly, in a rather spe­cific sense through the Curry-Howard cor­re­spon­dence, the use of $$Y$$ cor­re­sponds to ac­cept­ing non-con­struc­tive­ness in one’s proofs. Without a fixed-point fin­der, we can always build a con­struc­tive proof cor­re­spond­ing to a given lambda-term; but with $$Y$$ in­volved, we can no longer keep things con­struc­tive.

Parents:

• Mathematics

Math­e­mat­ics is the study of num­bers and other ideal ob­jects that can be de­scribed by ax­ioms.

• I think it’s con­fus­ing to in­tro­duce multi-ar­gu­ment func­tions be­fore talk­ing about cur­ry­ing. This makes it seem as though multi-ar­gu­ment func­tions are an in­trin­sic part of the lambda calcu­lus, rather than just func­tions that re­turn other func­tions.

• I think this sen­tence would be eas­ier to read with­out the paren­the­sis around the sec­ond sen­tence.