Moral uncertainty

“Mo­ral un­cer­tainty” in the con­text of AI refers to an agent with an “un­cer­tain util­ity func­tion”. That is, we can view the agent as pur­su­ing a util­ity func­tion that takes on differ­ent val­ues in differ­ent sub­sets of pos­si­ble wor­lds.

For ex­am­ple, an agent might have a meta-util­ity func­tion say­ing that eat­ing cake has a util­ity of €8 in wor­lds where Lee Har­vey Oswald shot John F. Kennedy and that eat­ing cake has a util­ity of €10 in wor­lds where it was the other way around. This agent will be mo­ti­vated to in­quire into poli­ti­cal his­tory to find out which util­ity func­tion is prob­a­bly the ‘cor­rect’ one (rel­a­tive to this meta-util­ity func­tion), though it will never be ab­solutely sure.

Mo­ral un­cer­tainty must be re­solv­able by some con­ceiv­able ob­ser­va­tion in or­der to func­tion as un­cer­tainty. Sup­pose for ex­am­ple that an agent’s prob­a­bil­ity dis­tri­bu­tion \(\Delta U\) over the ‘true’ util­ity func­tion \(U\) as­serts a de­pen­dency on a fair quan­tum coin that was flipped in­side a sealed box then de­stroyed by ex­plo­sives: the util­ity func­tion is \(U_1\) over out­comes in the wor­lds where the coin came up heads, and if the coin came up tails the util­ity func­tion is \(U_2.\) If the agent thinks it has no way of ever figur­ing out what hap­pened in­side the box, it will there­after be­have as if it had a sin­gle, con­stant, cer­tain util­ity func­tion equal to \(0.5 \cdot U_1 + 0.5 \cdot U_2.\)


  • Ideal target

    The ‘ideal tar­get’ of a meta-util­ity func­tion is the value the ground-level util­ity func­tion would take on if the agent up­dated on all pos­si­ble ev­i­dence; the ‘true’ util­ities un­der moral un­cer­tainty.