Mild optimization

“Mild op­ti­miza­tion” is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galax­ies with pink-painted cars, be­cause it’s not op­ti­miz­ing that hard. It’s okay with just paint­ing one car pink; it isn’t driven to max out the twen­tieth dec­i­mal place of its car-paint­ing score.

Other sug­gested terms for this con­cept have in­cluded “soft op­ti­miza­tion”, “suffi­cient op­ti­miza­tion”, “min­i­mum vi­able solu­tion”, “pretty good op­ti­miza­tion”, “mod­er­ate op­ti­miza­tion”, “reg­u­larized op­ti­miza­tion”, “sen­si­ble op­ti­miza­tion”, “ca­sual op­ti­miza­tion”, “ad­e­quate op­ti­miza­tion”, “good-not-great op­ti­miza­tion”, “le­nient op­ti­miza­tion”, “par­si­mo­nious op­ti­miza­tion”, and “op­ti­me­hza­tion”.

Differ­ence from low impact

Mild op­ti­miza­tion is com­ple­men­tary to task­i­ness and low im­pact. A low im­pact AGI might try to paint one car pink while min­i­miz­ing its other foot­print or how many other things changed, but it would be try­ing as hard as pos­si­ble to min­i­mize that im­pact and drive it down as close to zero as pos­si­ble, which might come with its own set of patholo­gies.

What we re­ally want is both prop­er­ties. We want the AGI to paint one car pink in a way that gets the im­pact pretty low and then, you know, that’s good enough—not have a cog­ni­tive pres­sure to search through weird ex­tremes look­ing for a way to de­crease the twen­tieth dec­i­mal place of the im­pact. This would tend to break a low im­pact mea­sure which con­tained even a sub­tle flaw, where a mild-op­ti­miz­ing AGI might not put as much pres­sure on the low im­pact mea­sure and hence be less likely to break it.

(Ob­vi­ously, what we want is a perfect low im­pact mea­sure which will keep us safe even if sub­jected to un­limited op­ti­miza­tion power, but a ba­sic se­cu­rity mind­set is to try to make each part safe on its own, then as­sume it might con­tain a flaw and try to de­sign the rest of the sys­tem to be safe any­way.)

Differ­ence from satisficing

Satis­fic­ing util­ity func­tions don’t nec­es­sar­ily man­date or even al­low mild­ness.

Sup­pose the AI’s util­ity func­tion is 1 when at least one car has been painted pink and 0 oth­er­wise—there’s no more util­ity to be gained by out­comes in which more cars have been painted pink. Will this AI still go to crazy-seem­ing lengths?

Yes, be­cause in a par­tially un­cer­tain /​ prob­a­bil­is­tic en­vi­ron­ment, there’s still no up­per bound on the util­ity which can be gained. A solu­tion with 0.9999 prob­a­bil­ity of paint­ing at least one car pink is ranked above a solu­tion with a 0.999 prob­a­bil­ity of paint­ing at least one car pink.

If a prefer­ence or­der­ing \(<p\) has the prop­erty that for ev­ery prob­a­bil­ity dis­tri­bu­tion on ex­pected out­comes \(O\) there’s an­other ex­pected out­come \(O'\) with \(O <_p O'\) which requires one more erg of energy to achieve, this is a sufficient condition for using up all the energy in the universe. If converting all reachable matter into pink-painted cars implies a slightly higher probability, that at least one car is pink, that's the maximum of expected utility under the 0-1 utility function.

Less naive satis­fic­ing would de­scribe an op­ti­mizer which satis­fies an ex­pected util­ity con­straint—say, if any policy pro­duces at least 0.95 ex­pected util­ity un­der the 0-1 util­ity func­tion, the AI can im­ple­ment that policy.

This rule is now a Task and would at least per­mit mild op­ti­miza­tion. The prob­lem is that it doesn’t ex­clude ex­tremely op­ti­mized solu­tions. A 0.99999999 prob­a­bil­ity of pro­duc­ing at least one pink-painted car also has the prop­erty that it’s above a 0.95 prob­a­bil­ity. If you’re a self-mod­ify­ing satis­ficer, re­plac­ing your­self with a max­i­mizer is prob­a­bly a satis­fic­ing solu­tion.

Even if we’re not deal­ing with a com­pletely self-mod­ify­ing agent, there’s a ubiquity of points where adding more op­ti­miza­tion pres­sure might satis­fice. When you build a ther­mo­stat in the en­vi­ron­ment, you’re co­erc­ing one part of the en­vi­ron­ment to have a par­tic­u­lar tem­per­a­ture; if this kind of thing doesn’t count as “more op­ti­miza­tion pres­sure” then we could be deal­ing with all sorts of ad­di­tional op­ti­miz­ing-ness that falls short of con­struct­ing a full sub­agent or do­ing a full self-mod­ifi­ca­tion. There’s all sorts of steps in cog­ni­tion where it would be just as easy to add a max­i­miz­ing step (take the high­est-rank­ing solu­tion) as to take a ran­dom high-rank­ing solu­tion.

On a higher level of ab­strac­tion, the prob­lem is that while satis­fic­ing is re­flec­tively con­sis­tent, it’s not re­flec­tively sta­ble. A satis­fic­ing agent is happy to con­struct an­other satis­fic­ing agent, but it may also be happy to con­struct a max­i­miz­ing agent. It can ap­prove its cur­rent mode of think­ing, but it ap­proves other modes of think­ing too. So un­less all the cog­ni­tive steps are be­ing car­ried out lo­cally on fixed known al­gorithms that satis­fice but definitely don’t max­i­mize, with­out the AGI con­struct­ing any en­vi­ron­men­tal com­pu­ta­tions or con­di­tional policy steps more com­pli­cated than a pocket calcu­la­tor, build­ing a seem­ingly mild satis­ficer doesn’t guaran­tee that op­ti­miza­tion stays mild.


One weird idea that seems like it might ex­hibit in­cre­men­tal progress to­ward re­flec­tively sta­ble mild op­ti­miza­tion is Jes­sica Tay­lor’s ex­pected util­ity quan­tilizer. Roughly, a quan­tilizer es­ti­mates ex­pected out­comes rel­a­tive to a null ac­tion, and then tries to pro­duce an ex­pected out­come in some up­per quan­tile of pos­si­bil­ities—e.g., an out­come in the top 1% of ex­pected out­comes. Fur­ther­more, a quan­tilizer only tries to nar­row out­comes by that much—it doesn’t try to pro­duce one par­tic­u­lar out­come in the top 1%; the most it will ever try to do is ran­domly pick an out­come such that this ran­dom dis­tri­bu­tion cor­re­sponds to be­ing in the top 1% of ex­pected out­comes.

Quan­tiliz­ing cor­re­sponds to max­i­miz­ing ex­pected util­ity un­der the as­sump­tion that there is un­cer­tainty about which out­comes are good and an ad­ver­sar­ial pro­cess which can make some out­comes ar­bi­trar­ily bad, sub­ject to the con­straint that the ex­pected util­ity of the null ac­tion can only be bound­edly low. So if there’s an out­come which would be very im­prob­a­ble given the sta­tus quo, the ad­ver­sary can make that out­come be very bad. This means that rather than aiming for one sin­gle high-util­ity out­come which the ad­ver­sary could then make very bad, a quan­tilizer tries for a range of pos­si­ble good out­comes. This in turn means that quan­tiliz­ers will ac­tively avoid nar­row­ing down the fu­ture too much, even if by do­ing so they’d en­ter re­gions of very high util­ity.

Quan­tiliza­tion doesn’t seem like ex­actly what we ac­tu­ally want for mul­ti­ple rea­sons. E.g., if long-run good out­comes are very im­prob­a­ble given sta­tus quo, it seems like a quan­tilizer would try to have its poli­cies fall short of that in the long run (a similar prob­lem seems like it might ap­pear in im­pact mea­sures which im­ply that good long-run out­comes have high im­pact).

The key im­por­tant idea that ap­pears in quan­tiliz­ing is that a quan­tilizer isn’t just as happy to rewrite it­self as a max­i­mizer, and isn’t just as happy to im­ple­ment a policy that in­volves con­struct­ing a more pow­er­ful op­ti­mizer in the en­vi­ron­ment.

Re­la­tion to other problems

Mild op­ti­miza­tion re­lates di­rectly to one of the three core rea­sons why al­ign­ing at-least-par­tially su­per­hu­man AGI is hard—mak­ing very pow­er­ful op­ti­miza­tion pres­sures flow through the sys­tem puts a lot of stress on its po­ten­tial weak­nesses and flaws. To the ex­tent we can get mild op­ti­miza­tion sta­ble, it might take some of the crit­i­cal-failure pres­sure off other parts of the sys­tem. (Though again, ba­sic se­cu­rity mind­set says to still try to get all the parts of the sys­tem as flawless as pos­si­ble and not tol­er­ate any known flaws in them, then build the fal­lback op­tions in case they’re flawed any­way; one should not de­liber­ately rely on the fal­lbacks and in­tend them to be ac­ti­vated.)

Mild op­ti­miza­tion seems strongly com­ple­men­tary to low im­pact and task­i­ness. Some­thing that’s merely low-im­pact might ex­hibit patholog­i­cal be­hav­ior from try­ing to drive side im­pacts down to ab­solutely zero. Some­thing that merely op­ti­mizes mildly might find some ‘weak’ or ‘not ac­tu­ally try­ing that hard’ solu­tion which nonethe­less ended up turn­ing the galax­ies into pink-painted cars. Some­thing that has a satis­fi­able util­ity func­tion with a read­ily-achiev­able max­i­mum achiev­able util­ity might still go to tremen­dous lengths to drive the prob­a­bil­ity of achiev­ing max­i­mum util­ity to nearly 1. Some­thing that op­ti­mizes mildly and has a low im­pact penalty and has a small, clearly achiev­able goal, seems much more like the sort of agent that might, you know, just paint the damn car pink and then stop.

Mild op­ti­miza­tion can be seen as a fur­ther desider­a­tum of the cur­rently open Other-izer Prob­lem: Be­sides be­ing work­able for bounded agents, and be­ing re­flec­tively sta­ble, we’d also like an other-izer idiom to have a (sta­ble) mild­ness pa­ram­e­ter.


It cur­rently seems like the key sub­prob­lem in mild op­ti­miza­tion re­volves around re­flec­tive sta­bil­ity—we don’t want “re­place the mild op­ti­miza­tion part with a sim­ple max­i­mizer, be­com­ing a max­i­mizer isn’t that hard and gets the task done” to count as a ‘mild’ solu­tion. Even in hu­man in­tu­itive terms of “op­ti­miz­ing with­out putting in an un­rea­son­able amount of effort”, at some point a suffi­ciently ad­vanced hu­man in­tel­li­gence gets lazy and starts build­ing an AGI to do things for them be­cause it’s eas­ier that way and only takes a bounded amount of effort. We don’t want “con­struct a sec­ond AGI that does hard op­ti­miza­tion” to count as mild op­ti­miza­tion even if it ends up not tak­ing all that much effort for the first AGI, al­though “con­struct an AGI that does \(\theta\)-mild op­ti­miza­tion” could po­ten­tially count as a \(\theta\)-mild­solu­tion.

Similarly, we don’t want to al­low the de­liber­ate cre­ation of en­vi­ron­men­tal or in­ter­nal dae­mons even if it’s easy to do it that way or re­quires low effort to end up with that side effect—we’d want the op­ti­miz­ing power of such dae­mons to count against the mea­sured op­ti­miza­tion power and be re­jected as op­ti­miz­ing too hard.

Since both of these phe­nom­ena seem hard to ex­hibit in cur­rent ma­chine learn­ing al­gorithms or faith­fully rep­re­sent in a toy prob­lem, un­bounded anal­y­sis seems likely to be the main way to go. In gen­eral, it seems closely re­lated to the Other-izer Prob­lem which also seems most amenable to un­bounded anal­y­sis at the pre­sent time.


  • Task-directed AGI

    An ad­vanced AI that’s meant to pur­sue a se­ries of limited-scope goals given it by the user. In Bostrom’s ter­minol­ogy, a Ge­nie.