Diamond maximizer

A difficult, far-reach­ing open prob­lem in AI al­ign­ment is to spec­ify an un­bounded for­mula for an agent that would, if run on an un­phys­i­cally large finite com­puter, cre­ate as much di­a­mond ma­te­rial as pos­si­ble. The goal of ‘di­a­monds’ was cho­sen to make it phys­i­cally crisp as to what con­sti­tutes a ‘di­a­mond’. Sup­pos­ing a crisp goal plus hy­per­com­pu­ta­tion avoids some prob­lems in value al­ign­ment, while still in­vok­ing many oth­ers, mak­ing it an in­ter­est­ing in­ter­me­di­ate prob­lem.


The di­a­mond max­i­mizer prob­lem is to give an un­bounded de­scrip­tion of a com­puter pro­gram such that, if it were in­stan­ti­ated on a suffi­ciently pow­er­ful but phys­i­cal com­puter, the re­sult of run­ning the pro­gram would be the cre­ation of an im­mense amount of di­a­mond—around as much di­a­mond as is phys­i­cally pos­si­ble for an agent to cre­ate.

The fact that this prob­lem is still ex­tremely hard shows that the value al­ign­ment prob­lem is not just due to the Com­plex­ity of value. As a thought ex­per­i­ment, it helps to dis­t­in­guish value-com­plex­ity-laden difficul­ties from those that arise even for sim­ple goals.

It also helps to illus­trate the difficulty of value al­ign­ment by mak­ing the more clearly visi­ble point that we can’t even figure out how to cre­ate lots of di­a­mond us­ing un­limited com­put­ing power, never mind cre­at­ing value us­ing bounded com­put­ing power.

Prob­lems avoided

If we can crisply define ex­actly what a ‘di­a­mond’ is, in the­ory it seems like we should be able to avoid is­sues of Edge In­stan­ti­a­tion, Un­fore­seen Max­i­mums, and try­ing to con­vey com­plex val­ues into the agent.

The amount of di­a­mond is defined as the num­ber of car­bon atoms that are co­va­lently bonded, by elec­trons, to ex­actly four other car­bon atoms. A car­bon atom is any nu­cleus con­tain­ing six pro­tons and any num­ber of neu­trons, bound by the strong force. The util­ity of a uni­ver­sal his­tory is the to­tal amount of Minkowskian in­ter­val spent by all car­bon atoms be­ing bound to ex­actly four other car­bon atoms. More pre­cise defi­ni­tions of ‘bound’, or the amount of mea­sure in a quan­tum sys­tem that is be­ing bound, are left to the reader—any crisp defi­ni­tion will do, so long as we are con­fi­dent that it has no un­fore­seen max­i­mum at things we don’t in­tu­itively see as di­a­monds.

Prob­lems challenged

Since this di­a­mond max­i­mizer would hy­po­thet­i­cally be im­ple­mented on a very large but phys­i­cal com­puter, it would con­front re­flec­tive sta­bil­ity, the anvil prob­lem, and the prob­lems of mak­ing sub­agents.

To the ex­tent the di­a­mond max­i­mizer might need to worry about other agents in the en­vi­ron­ment that have a good abil­ity to model, or that it may need to co­op­er­ate with other di­a­mond max­i­miz­ers, it must re­solve New­comblike prob­lems us­ing some log­i­cal de­ci­sion the­ory. This would also re­quire it to con­front log­i­cal un­cer­tainty de­spite pos­sess­ing im­mense amounts of com­put­ing power.

To the ex­tent the di­a­mond max­i­mizer must work well in a rich real uni­verse that might op­er­ate ac­cord­ing to any num­ber of pos­si­ble phys­i­cal laws, it faces a prob­lem of nat­u­ral­ized in­duc­tion and on­tol­ogy iden­ti­fi­ca­tion. See the ar­ti­cle on on­tol­ogy iden­ti­fi­ca­tion for the case that even for the goal of ‘make di­a­monds’, the prob­lem of goal iden­ti­fi­ca­tion re­mains difficult.

Un­re­flec­tive di­a­mond maximizer

As a fur­ther-sim­plified but still un­solved prob­lem, an un­re­flec­tive di­a­mond max­i­mizer is a di­a­mond max­i­mizer im­ple­mented on a Carte­sian hy­per­com­puter in a causal uni­verse that does not face any New­comblike prob­lems. This fur­ther avoids prob­lems of re­flec­tivity and log­i­cal un­cer­tainty. In this case, it seems plau­si­ble that the pri­mary difficulty re­main­ing is just the on­tol­ogy iden­ti­fi­ca­tion prob­lem. Thus the open prob­lem of de­scribing an un­re­flec­tive di­a­mond max­i­mizer is a cen­tral illus­tra­tion for the difficulty of on­tol­ogy iden­ti­fi­ca­tion.


  • Ontology identification problem

    How do we link an agent’s util­ity func­tion to its model of the world, when we don’t know what that model will look like?