Time-machine metaphor for efficient agents

The time-ma­chine metaphor is an in­tu­ition pump for in­stru­men­tally effi­cient agents—agents smart enough that they always get at least as much of their util­ity as any strat­egy we can imag­ine. Tak­ing a su­per­in­tel­li­gent pa­per­clip max­i­mizer as our cen­tral ex­am­ple, rather than vi­su­al­iz­ing a brood­ing mind and spirit de­cid­ing which ac­tions to out­put in or­der to achieve its pa­per­clippy de­sires, we should con­sider vi­su­al­iz­ing a time ma­chine hooked up to an out­put chan­nel, which, for ev­ery pos­si­ble out­put, peeks at how many pa­per­clips will re­sult in the fu­ture given that out­put, and which always yields the out­put lead­ing to the great­est num­ber of pa­per­clips. In the metaphor, we’re not to imag­ine the time ma­chine as hav­ing any sort of mind or in­ten­tions—it just re­peat­edly out­puts the ac­tion that ac­tu­ally leads to the most pa­per­clips.

The time ma­chine metaphor isn’t a perfect way to vi­su­al­ize a bounded su­per­in­tel­li­gence; the time ma­chine is strictly more pow­er­ful. E.g., the time ma­chine can in­stantly un­ravel a 4096-bit en­cryp­tion key be­cause it ‘knows’ the bit­string that is the an­swer. So the point of this metaphor is not as an in­tu­ition pump for ca­pa­bil­ities, but rather, an in­tu­ition pump for over­com­ing an­thro­po­mor­phism in rea­son­ing about a pa­per­clip max­i­mizer’s poli­cies; or as an in­tu­ition pump for un­der­stand­ing the sense-up­date-pre­dict-act agent ar­chi­tec­ture.

That is: If you imag­ine a su­per­in­tel­li­gent pa­per­clip max­i­mizer as a mind, you might imag­ine per­suad­ing it that, re­ally, it can get more pa­per­clips by trad­ing with hu­mans in­stead of turn­ing them into pa­per­clips. If you imag­ine a time ma­chine, which isn’t a mind, you’re less likely to imag­ine per­suad­ing it, and in­stead ask more hon­estly the ques­tion, “What is the max­i­mum num­ber of pa­per­clips the uni­verse can be turned into, and how would one go about do­ing that?” In­stead of imag­in­ing our­selves ar­gu­ing with Clippy about how hu­mans re­ally are very pro­duc­tive, we ask the ques­tion from the time ma­chine’s stand­point—which uni­verse ac­tu­ally ends up with more pa­per­clips in it?

The rele­vant fact about in­stru­men­tally effi­cient agents is that they are, from our per­spec­tive, un­bi­ased (in the statis­ti­cal sense of bias) in their poli­cies, rel­a­tive to any kind of bias we can de­tect.

As an ex­am­ple, con­sider a 2015-era chess en­g­ine, con­trasted to a 1985-era chess en­g­ine. The 1985-era chess en­g­ine may lose to a mod­er­ately strong hu­man am­a­teur, so it’s not rel­a­tively effi­cient. It may have hu­manly-per­ceiv­able quirks such as “It likes to move its queen”, that is, “I de­tect that it moves its queen more of­ten than would be strictly re­quired to win the game.” As we go from 1985 to 2015, the ma­chine chess­player im­proves be­yond the point where we, per­son­ally, can de­tect any flaws in it. You should ex­pect the rea­son why the 2015 chess en­g­ine moves any­where to be only un­der­stand­able to you (with­out ma­chine as­sis­tance) as “be­cause that move had a great prob­a­bil­ity of lead­ing to a win­ning po­si­tion later”, and not in any other psy­cholog­i­cal terms like “it likes to move its pawn”.

From your per­spec­tive, the 2015 chess en­g­ine will only move its pawn on oc­ca­sions where that prob­a­bly leads to win­ning the game, and does not move the pawn on oc­ca­sions where it leads to los­ing the game. If you see the 2015 chess en­g­ine make a move you didn’t think was high in win­ning­ness, you con­clude that it has seen some win­ning­ness you didn’t know about and is about to do ex­cep­tion­ally well, or you con­clude that the move you fa­vored led into fu­tures sur­pris­ingly low in win­ning­ness, and not that the chess en­g­ine is fa­vor­ing some un­win­ning move. We can no longer per­son­ally and with­out ma­chine as­sis­tance de­tect any sys­tem­atic de­par­ture from “It makes the chess move that leads to win­ning the game” in the di­rec­tion of “It fa­vors some other class of chess move for rea­sons apart from its win­ning­ness.”

This is what makes the time ma­chine metaphor a good in­tu­ition pump for an in­stru­men­tally effi­cient agent’s choice of poli­cies (though not a good in­tu­ition for the mag­ni­tude of its ca­pa­bil­ities).


  • Epistemic and instrumental efficiency

    An effi­cient agent never makes a mis­take you can pre­dict. You can never suc­cess­fully pre­dict a di­rec­tional bias in its es­ti­mates.