Other-izing (wanted: new optimization idiom)

The open “other-izer” prob­lem is to find some­thing be­sides max­i­miz­ing, satisific­ing, me­lioriz­ing, and sev­eral other ex­ist­ing but un­satis­fac­tory idioms, which is ac­tu­ally suit­able as an op­ti­miza­tion idiom for bounded agents and is re­flec­tively sta­ble.

In stan­dard the­ory we tend to as­sume that agents are ex­pected util­ity max­i­miz­ers that always choose the available op­tion with high­est ex­pected util­ity. But this isn’t a re­al­is­tic idiom be­cause a re­al­is­tic, bounded agent with limited com­put­ing power can’t com­pute the ex­pected util­ity of ev­ery pos­si­ble ac­tion.

An ex­pected util­ity satis­ficer, which e.g. might ap­prove any policy so long as the ex­pected util­ity is at least 0.95, would be much more re­al­is­tic. But it also doesn’t seem suit­able for an ac­tual AGI, since, e.g., if policy X pro­duces at least ex­pected util­ity 0.98, then it would also satis­fice to ran­dom­ize be­tween mostly policy X and a small chance of policy Y that had ex­pected util­ity 0; this seems to give away a need­lessly large amount of util­ity. We’d prob­a­bly be fairly dis­turbed if an oth­er­wise al­igned AGI was ac­tu­ally do­ing that.

Satis­fic­ing is also re­flec­tively con­sis­tent but not re­flec­tively sta­ble—while tiling agents the­ory can give for­mu­la­tions of satis­ficers that will ap­prove the con­struc­tion of similar satis­ficers, a satis­ficer could also tile to a max­i­mizer. If your de­ci­sion crite­rion is to ap­prove poli­cies which achieve ex­pected util­ity at least \(\theta,\) and you ex­pect that an ex­pected util­ity max­i­miz­ing ver­sion of your­self would achieve ex­pected util­ity at least \(\theta,\) then you’ll ap­prove self-mod­ify­ing to be an ex­pected util­ity max­i­mizer. This is an­other rea­son to pre­fer a for­mu­la­tion of op­ti­miza­tion be­sides satis­fic­ing—if the AI is strongly self-mod­ify­ing, then there’s no guaran­tee that the ‘satis­fic­ing’ prop­erty would stick around and have our anal­y­sis go on be­ing ap­pli­ca­ble, and even if not strongly self-mod­ify­ing, it might still cre­ate non-satis­fic­ing chunks of cog­ni­tive mechanism in­side it­self or in the en­vi­ron­ment.

A me­liorizer has a cur­rent policy and only re­places it with poli­cies of in­creased ex­pected util­ity. Again, while it’s pos­si­ble to demon­strate that a me­liorizer can ap­prove self-mod­ify­ing to an­other me­liorizer and hence this idiom is re­flec­tively con­sis­tent, it doesn’t seem like it would be re­flec­tively sta­ble—be­com­ing a max­i­mizer or some­thing else might have higher ex­pected util­ity than stay­ing a me­liorizer.

The “other-izer” open prob­lem is to find some­thing bet­ter than max­i­miza­tion, satis­fic­ing, and me­lioriza­tion that ac­tu­ally makes sense as an idiom of op­ti­miza­tion for a re­source-bounded agent and that we’d think would be an okay thing for e.g. a Task AGI to do, which is at least re­flec­tively con­sis­tent, and prefer­ably re­flec­tively sta­ble.

See also “Mild op­ti­miza­tion” for a fur­ther desider­a­tum, namely an ad­justable pa­ram­e­ter of op­ti­miza­tion strength, that would be nice to have in an other-izer.


  • Reflective stability

    Want­ing to think the way you cur­rently think, build­ing other agents and self-mod­ifi­ca­tions that think the same way.