Diamond maximizer

A difficult, far-reaching open problem in AI alignment is to specify an unbounded formula for an agent that would, if run on an unphysically large finite computer, create as much diamond material as possible. The goal of ‘diamonds’ was chosen to make it physically crisp as to what constitutes a ‘diamond’. Supposing a crisp goal plus hypercomputation avoids some problems in value alignment, while still invoking many others, making it an interesting intermediate problem.

Importance

The diamond maximizer problem is to give an unbounded description of a computer program such that, if it were instantiated on a sufficiently powerful but physical computer, the result of running the program would be the creation of an immense amount of diamond—around as much diamond as is physically possible for an agent to create.

The fact that this problem is still extremely hard shows that the value alignment problem is not just due to the Complexity of value. As a thought experiment, it helps to distinguish value-complexity-laden difficulties from those that arise even for simple goals.

It also helps to illustrate the difficulty of value alignment by making the more clearly visible point that we can’t even figure out how to create lots of diamond using unlimited computing power, never mind creating value using bounded computing power.

Problems avoided

If we can crisply define exactly what a ‘diamond’ is, in theory it seems like we should be able to avoid issues of Edge Instantiation, Unforeseen Maximums, and trying to convey complex values into the agent.

The amount of diamond is defined as the number of carbon atoms that are covalently bonded, by electrons, to exactly four other carbon atoms. A carbon atom is any nucleus containing six protons and any number of neutrons, bound by the strong force. The utility of a universal history is the total amount of Minkowskian interval spent by all carbon atoms being bound to exactly four other carbon atoms. More precise definitions of ‘bound’, or the amount of measure in a quantum system that is being bound, are left to the reader—any crisp definition will do, so long as we are confident that it has no unforeseen maximum at things we don’t intuitively see as diamonds.

Problems challenged

Since this diamond maximizer would hypothetically be implemented on a very large but physical computer, it would confront reflective stability, the anvil problem, and the problems of making subagents.

To the extent the diamond maximizer might need to worry about other agents in the environment that have a good ability to model, or that it may need to cooperate with other diamond maximizers, it must resolve Newcomblike problems using some logical decision theory. This would also require it to confront logical uncertainty despite possessing immense amounts of computing power.

To the extent the diamond maximizer must work well in a rich real universe that might operate according to any number of possible physical laws, it faces a problem of naturalized induction and ontology identification. See the article on ontology identification for the case that even for the goal of ‘make diamonds’, the problem of goal identification remains difficult.

Unreflective diamond maximizer

As a further-simplified but still unsolved problem, an unreflective diamond maximizer is a diamond maximizer implemented on a Cartesian hypercomputer in a causal universe that does not face any Newcomblike problems. This further avoids problems of reflectivity and logical uncertainty. In this case, it seems plausible that the primary difficulty remaining is just the ontology identification problem. Thus the open problem of describing an unreflective diamond maximizer is a central illustration for the difficulty of ontology identification.

Parents:

  • Ontology identification problem

    How do we link an agent’s utility function to its model of the world, when we don’t know what that model will look like?