A ‘preference framework’ refers to a fixed algorithm that updates, or potentially changes in other ways, to determine what the agent terminal outcomes. ‘Preference framework’ is a term more general than ‘utility function’ which includes structurally complicated generalizations of utility functions.for
As a central example, the utility indifference proposal has the agent switching between utility functions \(U_X\) and \(U_Y\) depending on whether a switch is pressed. We can call this meta-system a ‘preference framework’ to avoid presuming in advance that it embodies a VNM-coherent utility function.
An even more general term would be preferring outcomes.which doesn’t presume that the agent operates by
- Moral uncertainty
A meta-utility function in which the utility function as usually considered, takes on different values in different possible worlds, potentially distinguishable by evidence.
- Meta-utility function
Preference frameworks built out of simple utility functions, but where, e.g., the ‘correct’ utility function for a possible world depends on whether a button is pressed.
- Attainable optimum
The ‘attainable optimum’ of an agent’s preferences is the best that agent can actually do given its finite intelligence and resources (as opposed to the global maximum of those preferences).
- Value alignment problem
You want to build an advanced AI with the right values… but how?