Other-izing (wanted: new optimization idiom)

The open “other-izer” problem is to find something besides maximizing, satisificing, meliorizing, and several other existing but unsatisfactory idioms, which is actually suitable as an optimization idiom for bounded agents and is reflectively stable.

In standard theory we tend to assume that agents are expected utility maximizers that always choose the available option with highest expected utility. But this isn’t a realistic idiom because a realistic, bounded agent with limited computing power can’t compute the expected utility of every possible action.

An expected utility satisficer, which e.g. might approve any policy so long as the expected utility is at least 0.95, would be much more realistic. But it also doesn’t seem suitable for an actual AGI, since, e.g., if policy X produces at least expected utility 0.98, then it would also satisfice to randomize between mostly policy X and a small chance of policy Y that had expected utility 0; this seems to give away a needlessly large amount of utility. We’d probably be fairly disturbed if an otherwise aligned AGI was actually doing that.

Satisficing is also reflectively consistent but not reflectively stable—while tiling agents theory can give formulations of satisficers that will approve the construction of similar satisficers, a satisficer could also tile to a maximizer. If your decision criterion is to approve policies which achieve expected utility at least \(\theta,\) and you expect that an expected utility maximizing version of yourself would achieve expected utility at least \(\theta,\) then you’ll approve self-modifying to be an expected utility maximizer. This is another reason to prefer a formulation of optimization besides satisficing—if the AI is strongly self-modifying, then there’s no guarantee that the ‘satisficing’ property would stick around and have our analysis go on being applicable, and even if not strongly self-modifying, it might still create non-satisficing chunks of cognitive mechanism inside itself or in the environment.

A meliorizer has a current policy and only replaces it with policies of increased expected utility. Again, while it’s possible to demonstrate that a meliorizer can approve self-modifying to another meliorizer and hence this idiom is reflectively consistent, it doesn’t seem like it would be reflectively stable—becoming a maximizer or something else might have higher expected utility than staying a meliorizer.

The “other-izer” open problem is to find something better than maximization, satisficing, and meliorization that actually makes sense as an idiom of optimization for a resource-bounded agent and that we’d think would be an okay thing for e.g. a Task AGI to do, which is at least reflectively consistent, and preferably reflectively stable.

See also “Mild optimization” for a further desideratum, namely an adjustable parameter of optimization strength, that would be nice to have in an other-izer.

Parents:

  • Reflective stability

    Wanting to think the way you currently think, building other agents and self-modifications that think the same way.