Recall that we say a set $A$ is smaller than or equal to a set $B$ if there is an injection from $A$ to $B$. The set $A$ has the same size as $B$ if there is a bijection between $A$ and $B$.

The Cantor-Schröder-Bernstein theorem states that if $A$ is smaller than or equal to $B$, and $B$ is smaller than or equal to $A$, then $A$ and $B$ have the same size. This tells us that the arithmetic of cardinals is well-behaved in that it behaves like a total order. It is similar to the arithmetic of the natural numbers in that it can never be the case that simultaneously $a < b$ and $b < a$.

Proofs

There are several proofs, some concrete and some less so.

Concrete proof

A clear explanation of the intuition of this proof has been written by Richard Evan Schwartz; see page 61 of the linked PDF, or search on the word “dog” (which appears in the first page of the explanation). The intuition is Math0 suitable.

Let $f: A \to B$ and $g: B \to A$ be injections; we will define a bijective function $h: A \to B$.

Because $f$ is injective, if $f$ ever hits $b$ (that is, if there is $a \in A$ such that $f(a) = b$) then it is possible to define $f^{-1}(b)$ to be the unique $a \in A$ such that $f(a) = b$; similarly with $g$. The slogan is “if $f^{-1}(a)$ exists, then it is well-defined: there is no leeway about which element we might choose to be $f^{-1}(a)$”.

Fix some $a \in A$, and consider the sequence $$\dots, f^{-1}(g^{-1}(a)), g^{-1}(a), a, f(a), g(f(a)), \dots$$ Now, this sequence might not extend infinitely to the left; it may not even get past $a$ to the left (if $g^{-1}(a)$ doesn’t exist, for instance). noteOn the other hand, perhaps the sequence does extend infinitely to the left. Also, the sequence might duplicate elements: it might be the case that $gfgf(a) = a$, for instance. noteAnd maybe there is a repeat somewhere to the left, too.

Similarly, we can fix some $b \in B$, and consider $$\dots g^{-1} f^{-1}(b), f^{-1}(b), b, g(b), f(g(b)), \dots$$

Every element of $A$, and every element of $B$, appears in one of these chains. Moreover, if $a \in A$ appears in two different chains, then in fact the two chains are the same noteThough maybe we’re looking at a different place on the same chain. If we compare the $g^{-1} f^{-1}(b)$-chain with the $b$-chain, we see they’re the same chain viewed two different ways: one viewing is two places offset from the other viewing., because each element of the chain specifies and is specified by the previous element in the chain (if it exists) and each element of the chain specifies and is specified by the next element of the chain.

So every element of $A$ and every element of $B$ is in exactly one chain. Now, it turns out that there are exactly four distinct “types” of chain.

It could extend infinitely in both directions without repeats. In this case, we define $h(a) = f(a)$ for each element of $A$ in the chain. (Basically assigning to each element of $A$ the element of $B$ which is next in the chain.)
It could extend infinitely off to the right, but it has a hard barrier at the left, and has no repeats: the chain stops at an element of $A$. In this case, again we define $h(a) = f(a)$ for each element of $A$ in the chain. (Again, basically assigning to each element of $A$ the element of $B$ which is next in the chain.)
It could extend infinitely off to the right, but it has a hard barrier at the left, and has no repeats: the chain stops at an element of $B$. In this case, we define $h(a) = g^{-1}(a)$ for each element of $A$ in the chain. (Basically assigning to each element of $A$ the element of $B$ which is previous in the chain.)
It could have repeats. Then it must actually be a cycle of even length (unrolled into an infinite line), because each element of the chain only depends on the one before it, and because we can’t have two successive elements of $A$ (since the elements are alternating between $A$ and $B$). In this case, we define $h(a) = f(a)$ for each element of $A$ in the chain. (Basically assigning to each element of $A$ the element of $B$ which is next in the chain.)

Have we actually made a bijection? Certainly our function is well-defined: every element of $A$ appears in exactly one chain, and we’ve specified where every element of $A$ in any chain goes, so we’ve specified where every element of $A$ goes.

Our function is surjective, because every element of $B$ is in a chain; if $b \in B$ has an element $a$ of $A$ before it in its chain, then we specified that $h$ takes $a$ to $b$, while if $b \in B$ is at the leftmost end of its chain, we specified that $h$ takes $g(b)$ (that is, the following element in the chain) to $b$.

Our function is injective. Since the chains don’t overlap, and the first three cases of “what a chain might look like” have no repeats at all, the only possible way an element of $B$ can be hit twice by $h$ is if that element lies in one of the cyclical chains. But then to elements of $A$ in that chain, $h$ assigns the following element of $B$; so $b \in B$ is hit only by the preceding element of $A$, which is the same in all cases because the chain is a cycle. noteThe picture in the linked intuitive document makes this much clearer.

Proof from the Knaster-Tarski theorem

This proof is very quick, but almost completely opaque. It relies on the Knaster-Tarski fixed point theorem, which states that if $X$ is a complete poset and $f: X \to X$ is order-preserving, then $f$ has a fixed point (i.e.$x$ such that $f(x) = x$).

Let $f: A \to B$ and $g: B \to A$ be injective.

We are looking for a partition $P \cup Q$ of $A$, and a partition $R \cup S$ of $B$, such that $f$ is injective from $P$ to $R$, and $g$ is injective from $S$ to $Q$. (Then we can just define our bijection $A \to B$ by “do $f$ on $P$, and do $g^{-1}$ on $Q$”.)

Now, the function $P \mapsto A \setminus g(B \setminus f(P))$ is order-preserving from the power set $\mathcal{P}(A)$ to $\mathcal{P}(A)$ (ordered by inclusion), because there is an even number of complements.

But $\mathcal{P}(A)$ is complete as a poset (proof), so by Knaster-Tarski there is a set $P$ such that $P = A \setminus g(B \setminus f(P))$.

This yields our partition as required. spell this out