Associativity: Intuition

Associative functions can be interpreted as families of functions that reduce lists down to a single output by combining adjacent elements in any order. Alternatively, associativity can be seen as a generalization of “listyness,” which captures and generalizes the “it doesn’t matter whether you added b to c or a to c, the result is b, c regardless” aspect of lists.

There are many different ways for a function to be associative, so it is difficult to provide a single litmus test for looking at a function and telling whether it associates (aside from just checking the associative axiom directly). However, a few heuristics can be used to make a good guess.

Associative operators as natural functions on lists

The generalized associative law says that an associative function \(f : X \times X \to X\) gives rise to a method for combining any non-empty list of \(X\) elements into a single output, where the order in which adjacent elements are combined doesn’t affect the result. We can flip that result around, and interpret associative operators as the pairwise versions of a certain class of “natural” functions for combining the elements of a list.

On this interpretation, we start by noting that some methods for reducing a list down to a single element can be broken down into pairwise combinations of adjacent elements, while others can’t. For example, when we’re trying to compute \(3 + 4 + 5 + 6,\) we can pick any two adjacent elements and start by combining those using the binary version of \(+\). But when we’re trying to compute adjacent_ones(0, 0, 1, 1, 0) to check whether the list has any two adjacent ones, we’re going to run into trouble if we only look for adjacent ones in the pairs (0, 0), (0, 1), and (1, 0).

The lists that can be reduced by pairwise combination of adjacent pairs have a nice locality property; the result can be computed only by looking at adjacent elements (without worrying about the global structure). Locality is a common idiom in physics and mathematics, so we might start by asking what sort of functions on lists have this locality property. The answer is “any list that can be broken down into pairwise combinations of elements such that the order doesn’t matter.” If we formalize that notion, we get the result that any function on lists with this locality property corresponds (in a one-to-one fashion) to an associative operation. Thus, we can view associativity as the mathematical formalization of this nice “locality” property on lists.

Empirically, this locality property turns out to be quite useful for math, physics, and in computer programming, as evidenced by the commonality of associative operators. See, for example, Group theory, or the pages on semigroups and monoids.

Associativity as a generalization of “list”

The above interpretation gives primacy to lists, and interprets associative operators in terms of natural functions on lists. We can invert that argument by treating associativity as a generalization of what it means for something to act “list-like.”

A list \([a, b, c, d, \ldots]\) is a set of elements that have been combined by some “combiner” function, where the order of the elements matters, but the order in which they were combined does not matter. For example, if we combine \(a\) with \(b\) (forming \([a, b]\)) and then combine that with \(c\), then we get the same list as if we combine \(b\) and \(c\) into \([b, c]\) first, and then combine \(a\) with that.

The very fact that we can unambiguously say “the list \([a, b, c]\)” without worrying about the order that the elements were combined in means that lists are built out of an associative “combination” operator. On this interpretation, associativity is capturing part of the essence of listyness, and associativity in general generalizes this notion. For example, associative operators are allowed to be a little forgetful about what exact elements you combined (e.g., 3 + 4 = 2 + 5) so long as you retain the “it doesn’t matter what order you combine the things in” property. In other words, we can view associativity as “part of what it means to be list-like.”

(One particularly important property of lists — namely, that they can be empty — is not captured by associativity alone. Associative operators on sets that have an element that acts like an “empty list” are called “monoids.” For more on the idea of generalizing the notion of “list”, refer to the page on monoids.)

Associative mechanisms

The above two interpretations give an intuition for what it means that a function is associative. This still leaves open the question of how a function can be associative. Imagine \(f : X \times X \to Y\) as a physical mechanism of wheels and gears. Someone says “$f$ is associative.” What does that mean, in terms of the function’s physical mechanisms? What should we expect to see when we pop the function open, given the knowledge that it “is associative”?

Recall that associativity says that the two methods for combining two instantiations of the function yield the same output:

Associative paths

Thus, the ultimate physical test of associativity is hooking up two instantiations of \(f\) as in the left diagram, and then checking whether dragging the mechanisms of the lower-right instantiation above the mechanisms of the upper-left instantiation (thereby reconfiguring the system according to the diagram on the right) causes the behavior of the overall system to change. What happens when the right-hand-side instantiation is given access to the middle input first versus second? Does that affect the behavior at all? If not, \(f\) is associative.

This is not always an easy property to check by looking at the mechanisms of \(f\) alone, and sometimes functions that appear non-associative (at first glance) turn out to be associative by apparent coincidence. In other words, there are many different ways for a function to be associative, so it is difficult to give a single simple criterion for determining associativity by looking at the internals of the function. However, we can use a few heuristics that help one distinguish associative functions from non-associative ones.

Heuristic: Can two copies of the function operate in parallel?

\(f\) is associative if, when using two copies of \(f\) to reduce three inputs to one output, then changing whether the right-hand copy gets access to the middle tape first vs second does not affect the output. One heuristic for checking whether this is the case is to check whether both copies of \(f\) can make use of the middle input at the same time, without getting in each other’s way. If so, \(f\) is likely associative.

For example, consider an implementation of \(+\) that gets piles of poker chips as input (where a pile of \(n\) chips represents the number \(n\)) and computes the output by simply sweeping all the poker chips from its input belts onto its output belt. To make a function that adds three piles of chips together, you could set up two two-pile adders in the configuration of the diagram on the left, but you could also have two two-tape sweepers operating on three tapes in parallel, such that they both sweep the middle tape. This parallelization wouldn’t change the output, and thus \(+\) is associative.

By contrast, consider a mechanism that takes wooden blocks as input, and glues them together, and nails silver-colored caps on either end of the glued block. For example, if you put in a red block on the left and a blue block on the right, you get a silver-red-blue-silver block in the output. You could set up two copies of these like the diagram on the left, but if you tired to parallelize them, you’d get into trouble — each mechanism would be trying to nail one of its caps into the place that the other mechanism was attempting to apply glue. And indeed, this mechanism is non-associative.

This heuristic is imperfect. Some mechanisms that seem difficult to parallelize are still associative. For example, consider the multiplier mechanism, which takes two poker piles as input and puts a copy of the left pile onto the output tape for every chip in the right pile. It would be difficult to parallelize two copies of this function: One would be trying to count the chips in the middle pile while the other was attempting to copy the chips in the middle pile, and the result might not be pretty. However, multiplication is associative, because a pile of \(x\)-many copies of a ($y$ copies of \(z\))-many poker chips has the same number of chips as a pile of ($x$ copies \(y\))-many copies of \(z\)-many poker chips.

Heuristic: Does the output interpretation match both input interpretations?

Another (vaguer) heuristic is to ask whether the output of the function should actually be treated as the same sort of thing as the input to the function. For example, recall the adjacent_ones function from above, which checks a list for adjacent ones, and returns 1 if it finds some and 0 otherwise. The inputs to adjacent_ones are 0 and 1, and the output is 0 or 1, but the output interpretation doesn’t quite match the input interpretation: Intuitively, the output is actually intended to mean “yes there were adjacent ones” or “not here weren’t adjacent ones”, and so applying adjacent_ones to the output of adjacent_ones is possible but ill-advised. If there is a mismatch between the output interpretation and at least one of the input interpretations, then the function probably isn’t associative.

For example, imagine a person who is playing a game that works as follows. The board has three positions: red, green, and blue. The player’s objective is to complete as many clockwise red-green-blue cycles as possible, without ever backtracking in the counter-clockwise direction.

3-cycle board

Each turn, the game offers them a choice of one of the three spaces, and they get to choose whether or not to travel to that square or stay where they are. Clearly, their preferences depend on where they currently are: If they’re on “red”, “green” is a good move and “blue” is a bad one; but if they’re on “blue” then choosing “green” is ill-advised. We can consider a binary function \(f\) which takes their current position on the left and the proposed position on the right, and returns the position that the player prefers. For example, \(f(red,blue)=red,\) \(f(red,green)=green,\) \(f(blue,blue)=blue,\) \(f(blue,green=blue).\) In this case, the interpretation of the left input is a “player position,” the interpretation of the right input is an “offered move”, and the interpretation of the output is the resulting “player position.” The output interpretation mismatches one of the input interpretations, which implies that \(f\) probably isn’t associative, and indeed it is not: \(f(f(red, green), blue))=blue,\) whereas \(f(red, f(green, blue))=red.\) The former expression can be interpreted as “where the player would be if they started at red, and were then offered green, and were then offered blue.” The latter expression doesn’t have a great interpretation, because it’s feeding the output of \(f(green, blue)\) (a player position) in as an “offered move.”

If the interpretation of the output (in this case, “player position”) mismatches the interpretations of at least one of the inputs, then the function likely isn’t associative. However, this heuristic is also imperfect: The most obvious interpretations of the inputs and outputs to the subtraction function are “they’re all just numbers,” and subtraction still fails to associate.

Further discussion

There are many different ways for a function to be associative, so it is difficult to give a simple litmus test. The ultimate test is always to imagine using two copies of \(f\) to combine three outputs into one, and check whether the result changes depending on whether the left-hand copy of \(f\) gets to run first (in which case it gets to access the second input belt at the source) or second (in which case its right-hand input is the right-hand copy’s output). For examples of functions that pass or fail this ultimate test, refer to the examples page.