Associativity: Intuition

As­so­ci­a­tive func­tions can be in­ter­preted as fam­i­lies of func­tions that re­duce lists down to a sin­gle out­put by com­bin­ing ad­ja­cent el­e­ments in any or­der. Alter­na­tively, as­so­ci­a­tivity can be seen as a gen­er­al­iza­tion of “listy­ness,” which cap­tures and gen­er­al­izes the “it doesn’t mat­ter whether you added b to c or a to c, the re­sult is b, c re­gard­less” as­pect of lists.

There are many differ­ent ways for a func­tion to be as­so­ci­a­tive, so it is difficult to provide a sin­gle lit­mus test for look­ing at a func­tion and tel­ling whether it as­so­ci­ates (aside from just check­ing the as­so­ci­a­tive ax­iom di­rectly). How­ever, a few heuris­tics can be used to make a good guess.

As­so­ci­a­tive op­er­a­tors as nat­u­ral func­tions on lists

The gen­er­al­ized as­so­ci­a­tive law says that an as­so­ci­a­tive func­tion \(f : X \times X \to X\) gives rise to a method for com­bin­ing any non-empty list of \(X\) el­e­ments into a sin­gle out­put, where the or­der in which ad­ja­cent el­e­ments are com­bined doesn’t af­fect the re­sult. We can flip that re­sult around, and in­ter­pret as­so­ci­a­tive op­er­a­tors as the pair­wise ver­sions of a cer­tain class of “nat­u­ral” func­tions for com­bin­ing the el­e­ments of a list.

On this in­ter­pre­ta­tion, we start by not­ing that some meth­ods for re­duc­ing a list down to a sin­gle el­e­ment can be bro­ken down into pair­wise com­bi­na­tions of ad­ja­cent el­e­ments, while oth­ers can’t. For ex­am­ple, when we’re try­ing to com­pute \(3 + 4 + 5 + 6,\) we can pick any two ad­ja­cent el­e­ments and start by com­bin­ing those us­ing the bi­nary ver­sion of \(+\). But when we’re try­ing to com­pute adjacent_ones(0, 0, 1, 1, 0) to check whether the list has any two ad­ja­cent ones, we’re go­ing to run into trou­ble if we only look for ad­ja­cent ones in the pairs (0, 0), (0, 1), and (1, 0).

The lists that can be re­duced by pair­wise com­bi­na­tion of ad­ja­cent pairs have a nice lo­cal­ity prop­erty; the re­sult can be com­puted only by look­ing at ad­ja­cent el­e­ments (with­out wor­ry­ing about the global struc­ture). Lo­cal­ity is a com­mon idiom in physics and math­e­mat­ics, so we might start by ask­ing what sort of func­tions on lists have this lo­cal­ity prop­erty. The an­swer is “any list that can be bro­ken down into pair­wise com­bi­na­tions of el­e­ments such that the or­der doesn’t mat­ter.” If we for­mal­ize that no­tion, we get the re­sult that any func­tion on lists with this lo­cal­ity prop­erty cor­re­sponds (in a one-to-one fash­ion) to an as­so­ci­a­tive op­er­a­tion. Thus, we can view as­so­ci­a­tivity as the math­e­mat­i­cal for­mal­iza­tion of this nice “lo­cal­ity” prop­erty on lists.

Em­piri­cally, this lo­cal­ity prop­erty turns out to be quite use­ful for math, physics, and in com­puter pro­gram­ming, as ev­i­denced by the com­mon­al­ity of as­so­ci­a­tive op­er­a­tors. See, for ex­am­ple, Group the­ory, or the pages on semi­groups and monoids.

As­so­ci­a­tivity as a gen­er­al­iza­tion of “list”

The above in­ter­pre­ta­tion gives pri­macy to lists, and in­ter­prets as­so­ci­a­tive op­er­a­tors in terms of nat­u­ral func­tions on lists. We can in­vert that ar­gu­ment by treat­ing as­so­ci­a­tivity as a gen­er­al­iza­tion of what it means for some­thing to act “list-like.”

A list \([a, b, c, d, \ldots]\) is a set of el­e­ments that have been com­bined by some “com­biner” func­tion, where the or­der of the el­e­ments mat­ters, but the or­der in which they were com­bined does not mat­ter. For ex­am­ple, if we com­bine \(a\) with \(b\) (form­ing \([a, b]\)) and then com­bine that with \(c\), then we get the same list as if we com­bine \(b\) and \(c\) into \([b, c]\) first, and then com­bine \(a\) with that.

The very fact that we can un­am­bigu­ously say “the list \([a, b, c]\)” with­out wor­ry­ing about the or­der that the el­e­ments were com­bined in means that lists are built out of an as­so­ci­a­tive “com­bi­na­tion” op­er­a­tor. On this in­ter­pre­ta­tion, as­so­ci­a­tivity is cap­tur­ing part of the essence of listy­ness, and as­so­ci­a­tivity in gen­eral gen­er­al­izes this no­tion. For ex­am­ple, as­so­ci­a­tive op­er­a­tors are al­lowed to be a lit­tle for­get­ful about what ex­act el­e­ments you com­bined (e.g., 3 + 4 = 2 + 5) so long as you re­tain the “it doesn’t mat­ter what or­der you com­bine the things in” prop­erty. In other words, we can view as­so­ci­a­tivity as “part of what it means to be list-like.”

(One par­tic­u­larly im­por­tant prop­erty of lists — namely, that they can be empty — is not cap­tured by as­so­ci­a­tivity alone. As­so­ci­a­tive op­er­a­tors on sets that have an el­e­ment that acts like an “empty list” are called “monoids.” For more on the idea of gen­er­al­iz­ing the no­tion of “list”, re­fer to the page on monoids.)

As­so­ci­a­tive mechanisms

The above two in­ter­pre­ta­tions give an in­tu­ition for what it means that a func­tion is as­so­ci­a­tive. This still leaves open the ques­tion of how a func­tion can be as­so­ci­a­tive. Imag­ine \(f : X \times X \to Y\) as a phys­i­cal mechanism of wheels and gears. Some­one says “$f$ is as­so­ci­a­tive.” What does that mean, in terms of the func­tion’s phys­i­cal mechanisms? What should we ex­pect to see when we pop the func­tion open, given the knowl­edge that it “is as­so­ci­a­tive”?

Re­call that as­so­ci­a­tivity says that the two meth­ods for com­bin­ing two in­stan­ti­a­tions of the func­tion yield the same out­put:

Associative paths

Thus, the ul­ti­mate phys­i­cal test of as­so­ci­a­tivity is hook­ing up two in­stan­ti­a­tions of \(f\) as in the left di­a­gram, and then check­ing whether drag­ging the mechanisms of the lower-right in­stan­ti­a­tion above the mechanisms of the up­per-left in­stan­ti­a­tion (thereby re­con­figur­ing the sys­tem ac­cord­ing to the di­a­gram on the right) causes the be­hav­ior of the over­all sys­tem to change. What hap­pens when the right-hand-side in­stan­ti­a­tion is given ac­cess to the mid­dle in­put first ver­sus sec­ond? Does that af­fect the be­hav­ior at all? If not, \(f\) is as­so­ci­a­tive.

This is not always an easy prop­erty to check by look­ing at the mechanisms of \(f\) alone, and some­times func­tions that ap­pear non-as­so­ci­a­tive (at first glance) turn out to be as­so­ci­a­tive by ap­par­ent co­in­ci­dence. In other words, there are many differ­ent ways for a func­tion to be as­so­ci­a­tive, so it is difficult to give a sin­gle sim­ple crite­rion for de­ter­min­ing as­so­ci­a­tivity by look­ing at the in­ter­nals of the func­tion. How­ever, we can use a few heuris­tics that help one dis­t­in­guish as­so­ci­a­tive func­tions from non-as­so­ci­a­tive ones.

Heuris­tic: Can two copies of the func­tion op­er­ate in par­allel?

\(f\) is as­so­ci­a­tive if, when us­ing two copies of \(f\) to re­duce three in­puts to one out­put, then chang­ing whether the right-hand copy gets ac­cess to the mid­dle tape first vs sec­ond does not af­fect the out­put. One heuris­tic for check­ing whether this is the case is to check whether both copies of \(f\) can make use of the mid­dle in­put at the same time, with­out get­ting in each other’s way. If so, \(f\) is likely as­so­ci­a­tive.

For ex­am­ple, con­sider an im­ple­men­ta­tion of \(+\) that gets piles of poker chips as in­put (where a pile of \(n\) chips rep­re­sents the num­ber \(n\)) and com­putes the out­put by sim­ply sweep­ing all the poker chips from its in­put belts onto its out­put belt. To make a func­tion that adds three piles of chips to­gether, you could set up two two-pile ad­ders in the con­figu­ra­tion of the di­a­gram on the left, but you could also have two two-tape sweep­ers op­er­at­ing on three tapes in par­allel, such that they both sweep the mid­dle tape. This par­alleliza­tion wouldn’t change the out­put, and thus \(+\) is as­so­ci­a­tive.

By con­trast, con­sider a mechanism that takes wooden blocks as in­put, and glues them to­gether, and nails silver-col­ored caps on ei­ther end of the glued block. For ex­am­ple, if you put in a red block on the left and a blue block on the right, you get a silver-red-blue-silver block in the out­put. You could set up two copies of these like the di­a­gram on the left, but if you tired to par­allelize them, you’d get into trou­ble — each mechanism would be try­ing to nail one of its caps into the place that the other mechanism was at­tempt­ing to ap­ply glue. And in­deed, this mechanism is non-as­so­ci­a­tive.

This heuris­tic is im­perfect. Some mechanisms that seem difficult to par­allelize are still as­so­ci­a­tive. For ex­am­ple, con­sider the mul­ti­plier mechanism, which takes two poker piles as in­put and puts a copy of the left pile onto the out­put tape for ev­ery chip in the right pile. It would be difficult to par­allelize two copies of this func­tion: One would be try­ing to count the chips in the mid­dle pile while the other was at­tempt­ing to copy the chips in the mid­dle pile, and the re­sult might not be pretty. How­ever, mul­ti­pli­ca­tion is as­so­ci­a­tive, be­cause a pile of \(x\)-many copies of a ($y$ copies of \(z\))-many poker chips has the same num­ber of chips as a pile of ($x$ copies \(y\))-many copies of \(z\)-many poker chips.

Heuris­tic: Does the out­put in­ter­pre­ta­tion match both in­put in­ter­pre­ta­tions?

Another (va­guer) heuris­tic is to ask whether the out­put of the func­tion should ac­tu­ally be treated as the same sort of thing as the in­put to the func­tion. For ex­am­ple, re­call the adjacent_ones func­tion from above, which checks a list for ad­ja­cent ones, and re­turns 1 if it finds some and 0 oth­er­wise. The in­puts to adjacent_ones are 0 and 1, and the out­put is 0 or 1, but the out­put in­ter­pre­ta­tion doesn’t quite match the in­put in­ter­pre­ta­tion: In­tu­itively, the out­put is ac­tu­ally in­tended to mean “yes there were ad­ja­cent ones” or “not here weren’t ad­ja­cent ones”, and so ap­ply­ing adjacent_ones to the out­put of adjacent_ones is pos­si­ble but ill-ad­vised. If there is a mis­match be­tween the out­put in­ter­pre­ta­tion and at least one of the in­put in­ter­pre­ta­tions, then the func­tion prob­a­bly isn’t as­so­ci­a­tive.

For ex­am­ple, imag­ine a per­son who is play­ing a game that works as fol­lows. The board has three po­si­tions: red, green, and blue. The player’s ob­jec­tive is to com­plete as many clock­wise red-green-blue cy­cles as pos­si­ble, with­out ever back­track­ing in the counter-clock­wise di­rec­tion.

3-cycle board

Each turn, the game offers them a choice of one of the three spaces, and they get to choose whether or not to travel to that square or stay where they are. Clearly, their prefer­ences de­pend on where they cur­rently are: If they’re on “red”, “green” is a good move and “blue” is a bad one; but if they’re on “blue” then choos­ing “green” is ill-ad­vised. We can con­sider a bi­nary func­tion \(f\) which takes their cur­rent po­si­tion on the left and the pro­posed po­si­tion on the right, and re­turns the po­si­tion that the player prefers. For ex­am­ple, \(f(red,blue)=red,\) \(f(red,green)=green,\) \(f(blue,blue)=blue,\) \(f(blue,green=blue).\) In this case, the in­ter­pre­ta­tion of the left in­put is a “player po­si­tion,” the in­ter­pre­ta­tion of the right in­put is an “offered move”, and the in­ter­pre­ta­tion of the out­put is the re­sult­ing “player po­si­tion.” The out­put in­ter­pre­ta­tion mis­matches one of the in­put in­ter­pre­ta­tions, which im­plies that \(f\) prob­a­bly isn’t as­so­ci­a­tive, and in­deed it is not: \(f(f(red, green), blue))=blue,\) whereas \(f(red, f(green, blue))=red.\) The former ex­pres­sion can be in­ter­preted as “where the player would be if they started at red, and were then offered green, and were then offered blue.” The lat­ter ex­pres­sion doesn’t have a great in­ter­pre­ta­tion, be­cause it’s feed­ing the out­put of \(f(green, blue)\) (a player po­si­tion) in as an “offered move.”

If the in­ter­pre­ta­tion of the out­put (in this case, “player po­si­tion”) mis­matches the in­ter­pre­ta­tions of at least one of the in­puts, then the func­tion likely isn’t as­so­ci­a­tive. How­ever, this heuris­tic is also im­perfect: The most ob­vi­ous in­ter­pre­ta­tions of the in­puts and out­puts to the sub­trac­tion func­tion are “they’re all just num­bers,” and sub­trac­tion still fails to as­so­ci­ate.

Fur­ther discussion

There are many differ­ent ways for a func­tion to be as­so­ci­a­tive, so it is difficult to give a sim­ple lit­mus test. The ul­ti­mate test is always to imag­ine us­ing two copies of \(f\) to com­bine three out­puts into one, and check whether the re­sult changes de­pend­ing on whether the left-hand copy of \(f\) gets to run first (in which case it gets to ac­cess the sec­ond in­put belt at the source) or sec­ond (in which case its right-hand in­put is the right-hand copy’s out­put). For ex­am­ples of func­tions that pass or fail this ul­ti­mate test, re­fer to the ex­am­ples page.