Methodology of unbounded analysis


“Un­bounded anal­y­sis” refers to de­ter­min­ing the be­hav­ior of a com­puter pro­gram that would, to ac­tu­ally run, re­quire an un­phys­i­cally large amount of com­put­ing power, or some­times hy­per­com­pu­ta­tion. If we know how to solve a prob­lem us­ing un­limited com­put­ing power, but not with real-world com­put­ers, then we have an “un­bounded solu­tion” but no “bounded solu­tion”.

As a cen­tral ex­am­ple, con­sider com­puter chess. The first pa­per ever writ­ten on com­puter chess, by Claude Shan­non in 1950, gave an un­bounded solu­tion for play­ing perfect chess by ex­haus­tively search­ing the tree of all le­gal chess moves. (Since a chess game draws if no pawn is moved and no piece is cap­tured for 50 moves, this is a finite search tree.) Shan­non then passed to con­sid­er­ing more bounded ways of play­ing im­perfect chess, such as eval­u­at­ing the worth of a midgame chess po­si­tion by count­ing the bal­ance of pieces, and search­ing a smaller tree up to midgame states. It wasn’t un­til 47 years later, in 1997, that Deep Blue beat Garry Kas­parov for the world cham­pi­onship, and there were mul­ti­ple new ba­sic in­sights along the way, such as alpha-beta prun­ing.

In 1836, there was a sen­sa­tion called the Me­chan­i­cal Turk, allegedly a chess-play­ing au­toma­ton. Edgar Allen Poe, who was also an am­a­teur ma­gi­cian, wrote an es­say ar­gu­ing that the Turk must con­tain a hu­man op­er­a­tor hid­den in the ap­para­tus (which it did). Be­sides an­a­lyz­ing the Turk’s out­ward ap­pear­ance to lo­cate the hid­den com­part­ment, Poe care­fully ar­gued as to why no ar­range­ment of wheels and gears could ever play chess in the first place, ex­plic­itly com­par­ing the Turk to “the calcu­lat­ing ma­chine of Mr. Bab­bage”:

Arith­meti­cal or alge­braical calcu­la­tions are, from their very na­ture, fixed and de­ter­mi­nate. Cer­tain data be­ing given, cer­tain re­sults nec­es­sar­ily and in­evitably fol­low But the case is widely differ­ent with the Chess-Player. With him there is no de­ter­mi­nate pro­gres­sion. No one move in chess nec­es­sar­ily fol­lows upon any one other. From no par­tic­u­lar dis­po­si­tion of the men at one pe­riod of a game can we pred­i­cate their dis­po­si­tion at a differ­ent pe­riod Now even grant­ing that the move­ments of the Au­toma­ton Chess-Player were in them­selves de­ter­mi­nate, they would be nec­es­sar­ily in­ter­rupted and disar­ranged by the in­de­ter­mi­nate will of his an­tag­o­nist. There is then no anal­ogy what­ever be­tween the op­er­a­tions of the Chess-Player, and those of the calcu­lat­ing ma­chine of Mr. Bab­bage It is quite cer­tain that the op­er­a­tions of the Au­toma­ton are reg­u­lated by mind, and by noth­ing else. In­deed this mat­ter is sus­cep­ti­ble of a math­e­mat­i­cal demon­stra­tion, a pri­ori.

(In other words: In an alge­braical prob­lem, each step fol­lows with the pre­vi­ous step of ne­ces­sity, and there­fore can be rep­re­sented by the de­ter­mi­nate mo­tions of wheels and gears as in Charles Bab­bage’s pro­posed com­put­ing en­g­ine. In chess, the player’s move and op­po­nent’s move don’t fol­low with ne­ces­sity from the board po­si­tion, and there­fore can’t be rep­re­sented by de­ter­minis­tic gears.)

This is an amaz­ingly so­phis­ti­cated re­mark, con­sid­er­ing the era. It even puts a finger on the part of chess that is com­pu­ta­tion­ally difficult, the com­bi­na­to­rial ex­plo­sion of pos­si­ble moves. And it is still en­tirely wrong.

Even if you know an un­bounded solu­tion to chess, you might still be 47 years away from a bounded solu­tion. But if you can’t state a pro­gram that solves the prob­lem in prin­ci­ple, you are in some sense con­fused about the na­ture of the cog­ni­tive work needed to solve the prob­lem. If you can’t even solve a prob­lem given in­finite com­put­ing power, you definitely can’t solve it us­ing bounded com­put­ing power. (Imag­ine Poe try­ing to write a chess-play­ing pro­gram be­fore he’d had the in­sight about search trees.)

We don’t presently know how to write a Python pro­gram that would be a nice AI if we ran it on a un­phys­i­cally large com­puter. Try­ing di­rectly to cross this con­cep­tual gap by carv­ing off pieces of the prob­lem and try­ing to de­vise un­bounded solu­tions to them is “the method­ol­ogy of un­bounded anal­y­sis in AI al­ign­ment the­ory”.

Since “bounded agent” has come to mean in gen­eral an agent that is re­al­is­tic, the term “un­bounded agent” also some­times refers to agents that:

  • Perfectly know their en­vi­ron­ments.

  • Can fully simu­late their en­vi­ron­ments (the agent is larger than its en­vi­ron­ment and/​or the en­vi­ron­ment is very sim­ple).

  • Oper­ate in turn-based, dis­crete time (the en­vi­ron­ment waits for the agent to com­pute its move).

  • Carte­sian agents that are perfectly sep­a­rated from the en­vi­ron­ment ex­cept for sen­sory in­puts and mo­tor out­puts.

Th­ese are two im­por­tantly dis­tinct kinds of un­bound­ed­ness, and we’ll re­fer to the above list of prop­er­ties as “un­re­al­ism” to dis­t­in­guish them here. Suffi­ciently un­re­al­is­tic se­tups are also called “toy mod­els” or “toy prob­lems”.

Un­re­al­is­tic se­tups have dis­ad­van­tages, most no­tably that the re­sults, ob­ser­va­tions, and solu­tions may fail to gen­er­al­ize to re­al­is­tic ap­pli­ca­tions. Cor­re­spond­ingly, the clas­sic pit­fall of un­bounded anal­y­sis is that it’s im­pos­si­ble to run the code, which means that cer­tain types of con­cep­tual and em­piri­cal er­rors are more likely to go un­caught (see be­low).

There are nonethe­less sev­eral forces push­ing tech­ni­cal work in value al­ign­ment the­ory to­ward un­bounded analy­ses and un­re­al­is­tic se­tups, at least for now:

  1. At­tack­ing con­fu­sion in the sim­plest set­tings. If we don’t know how to solve a prob­lem given un­limited com­put­ing power, this means we’re con­fused about the na­ture of the work to be performed. It is some­times worth­while to tackle this con­cep­tual con­fu­sion in a setup that tries to ex­pose the con­fus­ing part of the prob­lem as sim­ply as pos­si­ble. Try­ing to bound the pro­posed solu­tion or make it re­al­is­tic can in­tro­duce a lot of com­pli­ca­tions into this dis­cus­sion, ar­guably un­nec­es­sary ones. (Deep Blue was far more com­pli­cated than Shan­non’s ideal chess pro­gram, and it wouldn’t be do­ing Edgar Allen Poe any fa­vors to show him Deep Blue’s code and hide away Shan­non’s ideal out­line.)

  2. Unam­bigu­ous con­se­quences and com­mu­ni­ca­tion. In­tro­duc­ing ‘re­al­is­tic’ com­pli­ca­tions can make it difficult to en­gage in co­op­er­a­tive dis­course about which ideas have which con­se­quences. (AIXI was a wa­ter­shed mo­ment for al­ign­ment the­ory be­cause AIXI was for­mally speci­fied and there was no way to say “Oh, I didn’t mean that” when some­body pointed out that AIXI would try to seize con­trol of its own re­ward chan­nel.)

  3. More ad­vanced agents might be less idiosyn­cratic. In­creas­ing the cog­ni­tive power of an agent may some­times move its be­hav­ior closer to ideals and fur­ther from spe­cific com­pli­ca­tions of an al­gorithm. (From the per­spec­tive of a hu­man on­looker, early chess al­gorithms seemed to have weird idiosyn­cra­cies tied to their spe­cific al­gorithms. Modern chess pro­grams can from an in­tu­itive hu­man per­spec­tive be seen as just mak­ing good chess moves.)

  4. Runnable toy mod­els of­ten aren’t a good fit for ad­vanced-agent sce­nar­ios. Cur­rent AI al­gorithms are of­ten not good nat­u­ral fits for demon­strat­ing cer­tain phe­nom­ena that seem like they might pre­dictably oc­cur in suffi­ciently ad­vanced AIs. Usu­ally it’s a lot of work to carve off a piece of such a phe­nomenon and pose it as an un­bounded prob­lem. But that can still be sig­nifi­cantly eas­ier to fit into an un­bounded set­ting than into a toy model. (All ex­am­ples here are too com­pli­cated for one sen­tence, but see the sub­sec­tion be­low and the dis­cus­sion of Utility in­differ­ence and Tiling agents the­ory.)

Even to the ex­tent that these are good rea­sons, stan­dard pit­falls of un­bounded analy­ses and un­re­al­is­tic se­tups still ap­ply, and some of our col­lec­tive and in­di­vi­d­ual pre­cau­tions against them are dis­cussed be­low.

For his­tor­i­cal rea­sons, com­puter sci­en­tists are some­times sus­pi­cious of un­bounded or un­re­al­is­tic analy­ses, and may won­der aloud if they re­flect un­will­ing­ness or in­ca­pac­ity to do a par­tic­u­lar kind of work as­so­ci­ated with real-world code. For dis­cus­sion of this point see Why MIRI uses un­bounded anal­y­sis.

At­tack­ing con­fu­sion in the sim­plest set­tings.

Imag­ine some­body claiming that an ideal chess pro­gram ought to eval­u­ate the ideal good­ness of each move, and giv­ing their philo­soph­i­cal anal­y­sis in terms of a chess agent which knows the perfect good­ness of each move, with­out ever giv­ing an effec­tive speci­fi­ca­tion of what de­ter­mines the ideal good­ness, but us­ing the term \(\gamma\) to sym­bol­ize it so that the pa­per looks very for­mal. We can imag­ine an early pro­gram­mer sit­ting down to write a chess-play­ing pro­gram, craft­ing the parts that take in user-speci­fied moves and the part that dis­plays the chess screen, us­ing ran­dom num­bers for move good­ness un­til they get around to writ­ing the “move-good­ness de­ter­min­ing mod­ule”, and then fi­nally find­ing them­selves be­ing un­able to com­plete the pro­gram; at which point they fi­nally rec­og­nize that all the talk of “the ideal good­ness \(\gamma\) of a chess move” wasn’t an effec­tive speci­fi­ca­tion.

Part of the stan­dard virtue ethics of com­puter sci­ence in­cludes an in­junc­tion to write code in or­der to force this kind of grad stu­dent to re­al­ize that they don’t know how to effec­tively spec­ify some­thing, even if they sym­bol­ized it us­ing a Greek let­ter. But at least this kind of in­effec­tive­ness seems to be some­thing that some peo­ple can learn to de­tect with­out ac­tu­ally run­ning the code—con­sider that, in our ex­am­ple above, the philoso­pher-pro­gram­mer re­al­ized that they didn’t know how to com­pute \(\gamma\) at the point where they couldn’t com­plete that part of the pro­gram, not at a point where the pro­gram ran and failed. Ad­ding on all the code to take in user moves and dis­play a chess board on the screen only added to the amount of time re­quired to come to this re­al­iza­tion; once they know what be­ing un­able to write code feels like, they might be able to come to the same re­al­iza­tion much faster by stand­ing in front of a white­board and failing to write pseu­docode.

At this point, it some­times makes sense to step back and try to say ex­actly what you don’t know how to solve—try to crisply state what it is that you want an un­bounded solu­tion for. Some­times you can’t even do that much, and then you may ac­tu­ally have to spend some time think­ing ‘philo­soph­i­cally’ - the sort of stage where you talk to your­self about some mys­te­ri­ous ideal quan­tity of move-good­ness and you try to pin down what its prop­er­ties might be. It’s im­por­tant not to op­er­ate in this stage un­der the delu­sion that your move-good­ness \(\gamma\) is a well-speci­fied con­cept; it’s a sym­bol to rep­re­sent some­thing that you’re con­fused about. By ask­ing for an un­bounded solu­tion, or even an effec­tively-speci­fied rep­re­sen­ta­tion of the un­bounded prob­lem, that is, ask­ing for pseudo-code which could be run given noth­ing more than an un­phys­i­cally large com­puter (but no oth­er­wise-un­speci­fied Or­a­cle of Move Good­ness that out­puts \(\gamma\)), we’re try­ing to in­voke the “Can I write code yet?” test on what­ever philo­soph­i­cal rea­son­ing we’re do­ing.

Can try­ing to write run­ning code help at this stage? Yes, de­pend­ing on how easy it is to write small pro­grams that nat­u­rally rep­re­sent the struc­ture of the prob­lem you’re con­fused about how to solve. Edgar Allen Poe might have been will­ing to con­cede that he could con­ceive of de­ter­minis­tic gears that would de­ter­mine whether a pro­posed chess move was le­gal or ille­gal, and it’s pos­si­ble that if he’d ac­tu­ally tried to build that au­toma­ton and then try to layer gears on top of that to pick out par­tic­u­lar le­gal moves by what­ever means, he might have started to think that maybe chess was com­putable af­ter all—maybe even have hit upon a rep­re­sen­ta­tion of search, among other pos­si­ble things that gears could do, and so re­al­ized how in prin­ci­ple the prob­lem could be solved. But this adds a de­lay and a cost to build the au­toma­ton and try out vari­a­tions of it, and com­pli­ca­tions from try­ing to stay within the al­lowed num­ber of gears; and it’s not ob­vi­ous that there can’t pos­si­bly be any faster way to hit upon the idea of game-tree search, say, by try­ing to write pseu­docode or for­mu­las on a white­board, think­ing only about the core struc­ture of the game. If we ask how it is that Shan­non had an eas­ier time com­ing up with the un­bounded solu­tion (un­der­stand­ing the na­ture of the work to be performed) than Poe, the most ob­vi­ous cause would be the in­ter­ven­ing work by Church and Tur­ing (among oth­ers) on the na­ture of com­pu­ta­tion.

And then in other cases it’s not ob­vi­ous at all how you could well-rep­re­sent a prob­lem us­ing cur­rent AI al­gorithms, but with enough work you can figure out how rep­re­sent the prob­lem in an un­bounded set­ting.

The pit­fall of sim­ply­ing away the key, con­fus­ing part of the prob­lem.

The tiling agents prob­lem, in the rocket al­ign­ment metaphor, is the metaphor­i­cal equiv­a­lent of try­ing to fire a can­non­ball around a perfectly spher­i­cal Earth with no at­mo­sphere—to ob­tain an idea how any kind of “sta­ble or­bit” can work at all. Non­metaphor­i­cally, the given prob­lem is to ex­hibit the sim­plest non­triv­ial case of sta­ble self-mod­ifi­ca­tion—an agent that, given its cur­rent rea­son­ing, wants to cre­ate an agent with similar prop­er­ties as a suc­ces­sor, i.e., pre­serv­ing its cur­rent goals.

In a perfect world, we’d be able to, with no risk, fire up run­ning code for AI al­gorithms that rea­son freely about self-mod­ifi­ca­tion and have jus­tified be­liefs about how al­ter­na­tive ver­sions of their own code will be­have and the outer con­se­quences of that be­hav­ior (the way you might imag­ine what would hap­pen in the real world if you took a par­tic­u­lar drug af­fect­ing your cog­ni­tion). But this is way, way be­yond cur­rent AI al­gorithms to rep­re­sent in any sort of nat­u­ral or re­al­is­tic way.

A bad “un­bounded solu­tion” would be to sup­pose agents that could ex­actly simu­late their suc­ces­sors, de­ter­mine ex­actly what their suc­ces­sors would think, ex­trap­o­late the ex­act en­vi­ron­men­tal con­se­quences, and eval­u­ate those. If you sup­pose an ex­act simu­la­tion abil­ity, you don’t need to con­sider how the agent would rea­son us­ing gen­er­al­iza­tions, ab­strac­tions, or prob­a­bil­ities… but this triv­ial solu­tion doesn’t feel like it sheds any light or gets us any closer to un­der­stand­ing re­flec­tive sta­bil­ity; that is, it feels like the key part of the prob­lem has been sim­plified away and solv­ing what re­mains was too easy and didn’t help.

Faced with an “un­bounded solu­tion” you don’t like, the next step is to say crisply ex­actly what is wrong with it in the form of a new desider­a­tum for your solu­tion. In this case, our re­ply would be that for Agent 1 to ex­actly simu­late Agent 2, Agent 1 must be larger than Agent 2, and since we want to model sta­ble self-mod­ifi­ca­tion, we can’t in­tro­duce a re­quire­ment that Agent 2 be strictly weaker than Agent 1. More gen­er­ally, we ap­ply the in­sight of Vinge’s Prin­ci­ple to this ob­ser­va­tion and ar­rive at the desider­ata of Vingean un­cer­tainty and Vingean re­flec­tion, which we also de­mand that an un­bounded solu­tion ex­hibit.

This illus­trates one of the po­ten­tial gen­eral failure modes of us­ing an un­bounded setup to shed con­cep­tual light on a con­fus­ing prob­lem—namely, when you sim­plify away the key con­fus­ing is­sue you wanted to re­solve, and end up with a triv­ial-seem­ing solu­tion that sheds no light on the origi­nal prob­lem. A chief sign of this is when your pa­per is too easy to write. The next ac­tion is to say ex­actly what you sim­plified away, and put it in the form of a new desider­a­tum, and try to say ex­actly why your best cur­rent at­tempts can’t meet that desider­a­tum.

So we’ve now fur­ther added: we want the agent to gen­er­al­ize over pos­si­ble ex­act ac­tions and be­hav­iors of its suc­ces­sor rather than need­ing to know its suc­ces­sor’s ex­act ac­tions in or­der to ap­prove build­ing it.

With this new desider­a­tum in hand, there’s now an­other ob­vi­ous un­bounded model: con­sider de­ter­minis­tic agents op­er­at­ing in en­vi­ron­ments with known rules that rea­son about pos­si­ble de­signs and the en­vi­ron­ment us­ing first-or­der logic. The agent then uses an un­bounded proof search, which no cur­rent AI al­gorithm could tackle in rea­son­able time (albeit a hu­man en­g­ineer would be able to do it with a bunch of painstak­ing work) to ar­rive at jus­tified log­i­cal be­liefs about the effect of its suc­ces­sor on its en­vi­ron­ment. This is cer­tainly still ex­tremely un­re­al­is­tic; but has this again sim­plified away all the in­ter­est­ing parts of the prob­lem? In this case we can definitely re­ply, “No, it does ex­pose some­thing con­fus­ing” since we don’t in fact know how to build a tiling agent un­der this setup. It may not cap­ture all the con­fus­ing or in­ter­est­ing parts of the prob­lem, but it seems to ex­pose at least one con­fu­sion. Even if, as seems quite pos­si­ble, it’s in­tro­duced some new prob­lem that’s an ar­ti­fact of the log­i­cal setup and wouldn’t ap­ply to agents do­ing prob­a­bil­is­tic rea­son­ing, there’s then a rel­a­tively crisp challenge of, “Okay, show me how prob­a­bil­is­tic rea­son­ing re­solves the prob­lem, then.”

It’s not ob­vi­ous that there’s any­thing fur­ther to be gained by try­ing to cre­ate a toy model of the prob­lem, or a toy model of the best cur­rent un­satis­fac­tory par­tial solu­tion, that could run as code with some fur­ther cheat­ing and demo-rig­ging, but this is be­ing tried any­way. The tiling agents prob­lem did spend roughly nine years ex­clu­sively on pa­per be­fore that, and the best cur­rent un­satis­fac­tory solu­tion was ar­rived at with white­boards.

The pit­fall of resi­d­ual terms.

Be­sides “sim­plify­ing away the con­fus­ing part of the prob­lem”, an­other way that un­bounded think­ing can “bounce off” a con­fus­ing prob­lem is by cre­at­ing a resi­d­ual term that en­cap­su­lates the con­fu­sion. Cur­rently, there are good un­bounded speci­fi­ca­tions for Carte­sian non-self-mod­ify­ing ex­pected re­ward max­i­miz­ers: if we al­low the agent to use un­limited com­put­ing power, don’t al­low the en­vi­ron­ment to have un­limited com­put­ing power, don’t ask the agent to mod­ify it­self, sep­a­rate the agent from its en­vi­ron­ment by an im­per­me­able bar­rier through which only sen­sory in­for­ma­tion and mo­tor out­puts can pass, and then ask the agent to max­i­mize a sen­sory re­ward sig­nal, there’s a sim­ple Python pro­gram which is ex­pected to be­have su­per­in­tel­li­gently given suffi­cient com­put­ing power. If we then in­tro­duce per­me­abil­ity into the Carte­sian bound­ary and al­low for the pos­si­bil­ity that the agent can take drugs or drop an anvil on its own head, no­body has an un­bounded solu­tion to that prob­lem any more.

So one way of bounc­ing off that prob­lem is to say, “Oh, well, my agent calcu­lates the effect of its mo­tor ac­tions on the en­vi­ron­ment and the ex­pected effect on sen­sory in­for­ma­tion and the re­ward sig­nal, plus a resi­d­ual term \(\gamma\) which stands for the ex­pected util­ity of all effects of the agent’s ac­tions that change the agent’s pro­cess­ing or de­stroys its hard­ware”. How is \(\gamma\) to be com­puted? This is left un­said.

In this case you haven’t omit­ted the con­fus­ing part of the prob­lem, but you’ve packed it into a resi­d­ual term you can’t give an effec­tive speci­fi­ca­tion for calcu­lat­ing. So you no longer have an un­bounded solu­tion—you can’t write down the Python pro­gram that runs given un­limited com­put­ing power—and you’ve prob­a­bly failed to shed any im­por­tant light on the con­fus­ing part of the prob­lem. Again, one of the warn­ing signs here is that the pa­per is very easy to write, and read­ing it does not make the key prob­lem feel less like a hard opaque ball.

In­tro­duc­ing re­al­is­tic com­pli­ca­tions can make it hard to build col­lec­tive dis­course about which ideas have which con­se­quences.

One of the wa­ter­shed mo­ments in the his­tory of AI al­ign­ment the­ory was Mar­cus Hut­ter’s pro­posal for AIXI, not just be­cause it was the first time any­one had put to­gether a com­plete speci­fi­ca­tion of an un­bounded agent (in a Carte­sian set­ting), but also be­cause it was the first time that non-value-al­ign­ment could be pointed out in a com­pletely pinned-down way.

(work in progress)


  • AIXI

    How to build an (evil) su­per­in­tel­li­gent AI us­ing un­limited com­put­ing power and one page of Python code.

  • Solomonoff induction

    A sim­ple way to su­per­in­tel­li­gently pre­dict se­quences of data, given un­limited com­put­ing power.

  • Hypercomputer

    Some for­mal­isms de­mand com­put­ers larger than the limit of all finite computers

  • Unphysically large finite computer

    The imag­i­nary box re­quired to run pro­grams that re­quire im­pos­si­bly large, but finite, amounts of com­put­ing power.

  • Cartesian agent

    Agents sep­a­rated from their en­vi­ron­ments by im­per­me­able bar­ri­ers through which only sen­sory in­for­ma­tion can en­ter and mo­tor out­put can exit.

  • Mechanical Turk (example)

    The 19th-cen­tury chess-play­ing au­toma­ton known as the Me­chan­i­cal Turk ac­tu­ally had a hu­man op­er­a­tor in­side. Peo­ple at the time had in­ter­est­ing thoughts about the pos­si­bil­ity of me­chan­i­cal chess.

  • No-Free-Lunch theorems are often irrelevant

    There’s of­ten a the­o­rem prov­ing that some prob­lem has no op­ti­mal an­swer across ev­ery pos­si­ble world. But this may not mat­ter, since the real world is a spe­cial case. (E.g., a low-en­tropy uni­verse.)


  • Advanced safety

    An agent is re­ally safe when it has the ca­pac­ity to do any­thing, but chooses to do what the pro­gram­mer wants.