Ontology identification problem

In­tro­duc­tion: The on­tol­ogy iden­ti­fi­ca­tion prob­lem for un­re­flec­tive di­a­mond maximizers

A sim­plified but still very difficult open prob­lem in AI al­ign­ment is to state an un­bounded pro­gram im­ple­ment­ing a di­a­mond max­i­mizer that will turn as much of the phys­i­cal uni­verse into di­a­mond as pos­si­ble. The goal of “mak­ing di­a­monds” was cho­sen to have a crisp-seem­ing defi­ni­tion for our uni­verse (the amount of di­a­mond is the num­ber of car­bon atoms co­va­lently bound to four other car­bon atoms). If we can crisply define ex­actly what a ‘di­a­mond’ is, we can avert is­sues of try­ing to con­vey com­plex val­ues into the agent. (The un­re­flec­tive di­a­mond max­i­mizer pu­ta­tively has un­limited com­put­ing power, runs on a Carte­sian pro­ces­sor, and con­fronts no other agents similar to it­self. This averts many other prob­lems of re­flec­tivity, de­ci­sion the­ory and value al­ign­ment.)

Even with a seem­ingly crisp goal of “make di­a­monds”, we might still run into two prob­lems if we tried to write a hand-coded ob­ject-level util­ity func­tion that iden­ti­fied the amount of di­a­mond ma­te­rial:

  • Un­known sub­strate: We might not know the true, fun­da­men­tal on­tol­ogy of our own uni­verse, hence not know what stuff di­a­monds are re­ally made of. (What ex­actly is a car­bon atom? If you say it’s a nu­cleus with six pro­tons, what’s a pro­ton? If you define a pro­ton as be­ing made of quarks, what if there are un­known other par­ti­cles un­der­ly­ing quarks?)

  • It seems in­tu­itively like there ought to be some way to iden­tify car­bon atoms to an AI in some way that doesn’t de­pend on talk­ing about quarks. Do­ing this is part of the on­tol­ogy iden­ti­fi­ca­tion prob­lem.

  • Un­known rep­re­sen­ta­tion: We might crisply know what di­a­monds are in our uni­verse, but not know how to find di­a­monds in­side the agent’s model of the en­vi­ron­ment.

  • Again, it seems in­tu­itively like it ought to be pos­si­ble to iden­tify di­a­monds in the en­vi­ron­ment, even if we don’t know de­tails of the agent’s ex­act in­ter­nal rep­re­sen­ta­tion. Do­ing this is part of the on­tol­ogy iden­ti­fi­ca­tion prob­lem.

To in­tro­duce the gen­eral is­sues in on­tol­ogy iden­ti­fi­ca­tion, we’ll try to walk through the an­ti­ci­pated difficul­ties of con­struct­ing an un­bounded agent that would max­i­mize di­a­monds, by try­ing spe­cific meth­ods and sug­gest­ing an­ti­ci­pated difficul­ties of those meth­ods.

Difficulty of mak­ing AIXI-tl max­i­mize diamond

The clas­sic un­bounded agent—an agent us­ing far more com­put­ing power than the size of its en­vi­ron­ment—is AIXI. Roughly speak­ing, AIXI con­sid­ers all com­putable hy­pothe­ses for how its en­vi­ron­ment might be turn­ing AIXI’s mo­tor out­puts into AIXI’s sen­sory in­puts and re­wards. We can think of AIXI’s hy­poth­e­sis space as in­clud­ing all Tur­ing ma­chines that, se­quen­tially given AIXI’s mo­tor choices as in­puts, will out­put a se­quence of pre­dicted sense items and re­wards for AIXI. The finite var­i­ant AIXI-tl has a hy­poth­e­sis space that in­cludes all Tur­ing ma­chines that can be speci­fied us­ing fewer than \(l\) bits and run in less than time \(t\).

One way of see­ing the difficulty of on­tol­ogy iden­ti­fi­ca­tion is con­sid­er­ing why it would be difficult to make an AIXI-tl var­i­ant that max­i­mized ‘di­a­monds’ in­stead of ‘re­ward in­puts’.

The cen­tral difficulty here is that there’s no way to find ‘di­a­monds’ in­side the im­plicit rep­re­sen­ta­tions of AIXI-tl’s se­quence-pre­dict­ing Tur­ing ma­chines. Given an ar­bi­trary Tur­ing ma­chine that is suc­cess­fully pre­dict­ing AIXI-tl’s sense in­puts, there is no gen­eral rule for how to go from the rep­re­sen­ta­tion of that Tur­ing ma­chine to a state­ment about di­a­monds or car­bon atoms. The high­est-weighted Tur­ing ma­chines that have best pre­dicted the sen­sory data so far, pre­sum­ably con­tain some sort of rep­re­sen­ta­tion of the en­vi­ron­ment, but we have no idea how to get ‘the num­ber of di­a­monds’ out of it.

If AIXI has a we­b­cam, then the fi­nal out­puts of the Tur­ing ma­chine are pre­dic­tions about the stream of bits pro­duced by the we­b­cam, go­ing down the wire into AIXI. We can un­der­stand the mean­ing of that Tur­ing ma­chine’s out­put pre­dic­tions; those out­puts are meant to match types with the we­b­cam’s in­put. But we have no no­tion of any­thing else that Tur­ing ma­chine is rep­re­sent­ing. Even if some­where in the Tur­ing ma­chine hap­pens to be an atom­i­cally de­tailed model of the world, we don’t know what rep­re­sen­ta­tion it uses, or what for­mat it has, or how to look in­side it for the num­ber of di­a­monds that will ex­ist af­ter AIXI’s next mo­tor ac­tion.

This difficulty ul­ti­mately arises from AIXI be­ing con­structed around a Carte­sian paradigm of se­quence pre­dic­tion, with AIXI’s sense in­puts and mo­tor out­puts be­ing treated as se­quence el­e­ments, and the Tur­ing ma­chines in its hy­poth­e­sis space hav­ing in­puts and out­puts matched to the se­quence el­e­ments and oth­er­wise be­ing treated as black boxes. This means we can only get AIXI to max­i­mize di­rect func­tions of its sen­sory in­put, not any facts about the out­side en­vi­ron­ment.

(We can’t make AIXI max­i­mize di­a­monds by mak­ing it want pic­tures of di­a­monds be­cause then it will just, e.g., build an en­vi­ron­men­tal sub­agent that seizes con­trol of AIXI’s we­b­cam and shows it pic­tures of di­a­monds. If you ask AIXI to show it­self sen­sory pic­tures of di­a­monds, you can get it to show its we­b­cam lots of pic­tures of di­a­monds, but this is not the same thing as build­ing an en­vi­ron­men­tal di­a­mond max­i­mizer.)

Agent us­ing clas­si­cal atomic hypotheses

As an un­re­al­is­tic ex­am­ple: Sup­pose some­one was try­ing to define ‘di­a­monds’ to the AI’s util­ity func­tion, and sup­pose they knew about atomic physics but not nu­clear physics. Sup­pose they build an AI which, dur­ing its de­vel­op­ment phase, learns about atomic physics from the pro­gram­mers, and thus builds a world-model that is based on atomic physics.

Again for pur­poses of un­re­al­is­tic ex­am­ples, sup­pose that the AI’s world-model is en­coded in such fash­ion that when the AI imag­ines a molec­u­lar struc­ture—rep­re­sents a men­tal image of some molecules—car­bon atoms are rep­re­sented as a par­tic­u­lar kind of ba­sic el­e­ment of the rep­re­sen­ta­tion. Again, as an un­re­al­is­tic ex­am­ple, imag­ine that there are lit­tle LISP to­kens rep­re­sent­ing en­vi­ron­men­tal ob­jects, and that the en­vi­ron­men­tal-ob­ject-type of car­bon-ob­jects is en­coded by the in­te­ger 6. Imag­ine also that each atom, in­side this rep­re­sen­ta­tion, is fol­lowed by a list of the other atoms to which it’s co­va­lently bound. Then when the AI is imag­in­ing a car­bon atom par­ti­ci­pat­ing in a di­a­mond, in­side the rep­re­sen­ta­tion we would see an ob­ject of type 6, fol­lowed by a list con­tain­ing ex­actly four other 6-ob­jects.

Can we fix this rep­re­sen­ta­tion for all hy­pothe­ses, and then write a util­ity func­tion for the AI that counts the num­ber of type-6 ob­jects that are bound to ex­actly four other type-6 ob­jects? And if we did so, would the re­sult ac­tu­ally be a di­a­mond max­i­mizer?


We can imag­ine for­mu­lat­ing a var­i­ant of AIXI-tl that, rather than all tl-bounded Tur­ing ma­chines, con­sid­ers tl-bounded simu­lated atomic uni­verses—that is, simu­la­tions of clas­si­cal, pre-nu­clear physics. Call this AIXI-atomic.

A first difficulty is that uni­verses com­posed only of clas­si­cal atoms are not good ex­pla­na­tions of our own uni­verse, even in terms of sur­face phe­nom­ena; e.g., the ul­tra­vi­o­let catas­tro­phe. So let it be sup­posed that we have simu­la­tion rules for clas­si­cal physics that repli­cate at least what­ever phe­nom­ena the pro­gram­mers have ob­served at de­vel­op­ment time, even if the rules have some seem­ingly ad-hoc el­e­ments (like there be­ing no ul­tra­vi­o­lent catas­tro­phes).

A sec­ond difficulty is that a simu­lated uni­verse of clas­si­cal atoms does not iden­tify where in the uni­verse the AIXI-atomic agent re­sides, or that AIXI-atomic’s sense in­puts don’t have types com­men­su­rate with the types of atoms. We can elide this difficulty by imag­in­ing that AIXI-atomic simu­lates clas­si­cal uni­verses con­tain­ing a sin­gle hy­per­com­puter, and that AIXI-atomic knows a sim­ple func­tion from each simu­lated uni­verse onto its own sen­sory data (e.g., it knows to look at the simu­lated uni­verse, and trans­late simu­lated pho­tons im­p­ing­ing on its we­b­cam onto pre­dicted we­b­cam data in the re­ceived for­mat). This elides most of the prob­lem of nat­u­ral­ized in­duc­tion, by fix­ing the on­tol­ogy of all hy­pothe­ses and stan­dard­iz­ing their hy­po­thet­i­cal bridg­ing laws.

So the analo­gous AIXI-atomic agent that max­i­mizes di­a­mond:

  • Con­sid­ers only hy­pothe­ses that di­rectly rep­re­sent uni­verses as huge sys­tems of clas­si­cal atoms, so that the func­tion ‘count atoms bound to four other car­bon atoms’ can be di­rectly run over any pos­si­ble fu­ture the agent con­sid­ers.

  • As­signs prob­a­bil­is­tic pri­ors over these pos­si­ble atomic rep­re­sen­ta­tions of the uni­verse.

  • Some­how maps each atomic rep­re­sen­ta­tion onto the agent’s sen­sory ex­pe­riences and mo­tor ac­tions.

  • its pri­ors based on ac­tual sen­sory ex­pe­riences, the same as clas­si­cal AIXI.

  • Can eval­u­ate the ‘ex­pected di­a­mond­ness on the next turn’ of a sin­gle ac­tion by look­ing at all hy­po­thet­i­cal uni­verses where that ac­tion is performed, weighted by their cur­rent prob­a­bil­ity, and sum­ming over the ex­pec­ta­tion of di­a­mond-bound car­bon atoms on their next clock tick.

  • Can eval­u­ate the ‘fu­ture ex­pected di­a­mond­ness’ of an ac­tion, over some finite time hori­zon, by as­sum­ing that its fu­ture self will also Bayes-up­date and max­i­mize ex­pected di­a­mond­ness over that time hori­zon.

  • On each turn, out­puts the ac­tion with great­est ex­pected di­a­mond­ness over some finite time hori­zon.

Sup­pose our own real uni­verse was amended to oth­er­wise be ex­actly the same, but con­tain a sin­gle im­per­me­able hy­per­com­puter. Sup­pose we defined an agent like the one above, us­ing simu­la­tions of 1900-era clas­si­cal mod­els of physics, and ran that agent on the hy­per­com­puter. Should we ex­pect the re­sult to be an ac­tual di­a­mond max­i­mizer—that most mass in the uni­verse will be turned into car­bon and ar­ranged into di­a­monds?

An­ti­ci­pated failure of AIXI-atomic in our own uni­verse: try­ing to max­i­mize di­a­mond out­side the simulation

Our own uni­verse isn’t atomic, it’s nu­clear and quan­tum-me­chan­i­cal. This means that AIXI-atomic does not con­tain any hy­pothe­ses in its hy­poth­e­sis space that di­rectly rep­re­sent the uni­verse. By ‘di­rectly rep­re­sent’, we mean that car­bon atoms in AIXI-atomic’s best rep­re­sen­ta­tions do not cor­re­spond to car­bon atoms in our own world).

In­tu­itively, we would think it was com­mon sense for an agent that wanted di­a­monds to re­act to the ex­per­i­men­tal data iden­ti­fy­ing nu­clear physics, by de­cid­ing that a car­bon atom is ‘re­ally’ a nu­cleus con­tain­ing six pro­tons, and atomic bind­ing is ‘re­ally’ co­va­lent elec­tron-shar­ing. We can imag­ine this agent com­mon-sen­si­cally up­dat­ing its model of the uni­verse to a nu­clear model, and re­defin­ing the ‘car­bon atoms’ that its old util­ity func­tion counted to mean ‘nu­clei con­tain­ing ex­actly six pro­tons’. Then the new util­ity func­tion could eval­u­ate out­comes in the newly dis­cov­ered nu­clear-physics uni­verse. We will call this the util­ity re­bind­ing prob­lem.

We don’t yet have a crisp for­mula that seems like it would yield com­mon­sense be­hav­ior for util­ity re­bind­ing. In fact we don’t yet have any can­di­date for­mu­las for util­ity re­bind­ing, pe­riod. Stat­ing one is an open prob­lem. See be­low.

For the ‘clas­si­cal atomic AIXI’ agent we defined above, what hap­pens in­stead is that the ‘sim­plest atomic hy­poth­e­sis that fits the facts’ will be an enor­mous atom-based com­puter, simu­lat­ing nu­clear physics and quan­tum physics in or­der to con­trol AIXI’s we­b­cam, which is still be­lieved to be com­posed of atoms in ac­cor­dance with the pre­speci­fied bridg­ing laws. From our per­spec­tive this hy­poth­e­sis seems silly, but if you re­strict the hy­poth­e­sis space to only clas­si­cal atomic uni­verses, that’s what ends up be­ing the com­pu­ta­tion­ally sim­plest hy­poth­e­sis to ex­plain the re­sults of quan­tum ex­per­i­ments.

AIXI-atomic will then try to choose ac­tions so as to max­i­mize the amount of ex­pected di­a­mond in­side the prob­a­ble out­side uni­verses that could con­tain the gi­ant atom-based simu­la­tor of quan­tum physics. It is not ob­vi­ous what sort of be­hav­ior this would im­ply.

Me­taphor for difficulty: AIXI-atomic cares about only fun­da­men­tal carbon

One metaphor­i­cal way of look­ing at the prob­lem is that AIXI-atomic was im­plic­itly defined to care only about di­a­monds made out of on­tolog­i­cally fun­da­men­tal car­bon atoms, not di­a­monds made out of quarks. A prob­a­bil­ity func­tion that as­signs 0 prob­a­bil­ity to all uni­verses made of quarks, and a util­ity func­tion that out­puts a con­stant on all uni­verses made of quarks, yield func­tion­ally iden­ti­cal be­hav­ior. So it is an ex­act metaphor to say that AIXI-atomic only cares about uni­verses with on­tolog­i­cally ba­sic car­bon atoms, given that AIXI-atomic only be­lieves in uni­verses with on­tolog­i­cally ba­sic car­bon atoms.

Since AIXI-atomic only cares about di­a­mond made of fun­da­men­tal car­bon, when AIXI-atomic dis­cov­ered the ex­per­i­men­tal data im­ply­ing that al­most all of its prob­a­bil­ity mass should reside in nu­clear or quan­tum uni­verses in which there were no fun­da­men­tal car­bon atoms, AIXI-atomic stopped car­ing about the effect its ac­tions had on the vast ma­jor­ity of prob­a­bil­ity mass in­side its model. In­stead AIXI-atomic tried to max­i­mize in­side the tiny re­main­ing prob­a­bil­ities in which it was in­side a uni­verse with fun­da­men­tal car­bon atoms that was some­how re­pro­duc­ing its sen­sory ex­pe­rience of nu­clei and quan­tum fields; for ex­am­ple, a clas­si­cal atomic uni­verse with an atomic com­puter simu­lat­ing a quan­tum uni­verse and show­ing the re­sults to AIXI-atomic.

From our per­spec­tive, we failed to solve the ‘on­tol­ogy iden­ti­fi­ca­tion prob­lem’ and get the real-world re­sult we wanted, be­cause we tried to define the agent’s util­ity func­tion in terms of prop­er­ties of a uni­verse made out of atoms, and the real uni­verse turned out to be made of quan­tum fields. This caused the util­ity func­tion to fail to bind to the agent’s rep­re­sen­ta­tion in the way we in­tu­itively had in mind.

Ad­vanced-non­safety of hard­coded on­tol­ogy identifications

To­day we do know about quan­tum me­chan­ics, so if we tried to build an un­re­flec­tive di­a­mond max­i­mizer us­ing the above for­mula, it might not fail on ac­count of the par­tic­u­lar ex­act prob­lem of atomic physics be­ing false.

But per­haps there are dis­cov­er­ies still re­main­ing that would change our pic­ture of the uni­verse’s on­tol­ogy to im­ply some­thing else un­der­ly­ing quarks or quan­tum fields. Hu­man be­ings have only known about quan­tum fields for less than a cen­tury; our model of the on­tolog­i­cal ba­sics of our uni­verse has been sta­ble for less than a hun­dred years of our hu­man ex­pe­rience. So we should seek an AI de­sign that does not as­sume we know the ex­act, true, fun­da­men­tal on­tol­ogy of our uni­verse dur­ing an AI’s de­vel­op­ment phase. Or if our failure to know the ex­act laws of physics causes catas­trophic failure of the AI, we should at least heav­ily mark that this is a re­lied-on as­sump­tion.

Beyond AIXI-atomic: Di­a­mond iden­ti­fi­ca­tion in multi-level maps

A re­al­is­tic, bounded di­a­mond max­i­mizer wouldn’t rep­re­sent the out­side uni­verse with atom­i­cally de­tailed mod­els. In­stead, it would have some equiv­a­lent of a multi-level map of the world in which the agent knew in prin­ci­ple that things were com­posed of atoms, but didn’t model most things in atomic de­tail. E.g., its model of an air­plane would have wings, or wing shapes, rather than atom­i­cally de­tailed wings. It would think about wings when do­ing aero­dy­namic en­g­ineer­ing, atoms when do­ing chem­istry, nu­clear physics when do­ing nu­clear en­g­ineer­ing.

At the pre­sent, there are not yet any pro­posed for­mal­isms for how to do prob­a­bil­ity the­ory with multi-level maps (in other words: no­body has yet put for­ward a guess at how to solve the prob­lem even given in­finite com­put­ing power). Hav­ing some idea for how an agent could rea­son with multi-level maps, would be a good first step to­ward be­ing able to define a bounded ex­pected util­ity op­ti­mizer with a util­ity func­tion that could be eval­u­ated on multi-level maps. This in turn would be a first step to­wards defin­ing an agent with a util­ity func­tion that could re­bind it­self to chang­ing rep­re­sen­ta­tions in an up­dat­ing multi-level map.

If we were ac­tu­ally try­ing to build a di­a­mond max­i­mizer, we would be likely to en­counter this prob­lem long be­fore it started for­mu­lat­ing new physics. The equiv­a­lent of a com­pu­ta­tional dis­cov­ery that changes ‘the most effi­cient way to rep­re­sent di­a­monds’ is likely to hap­pen much ear­lier than a phys­i­cal dis­cov­ery that changes ‘what un­der­ly­ing phys­i­cal sys­tems prob­a­bly con­sti­tute a di­a­mond’.

This also means that, on the ac­tual value load­ing prob­lem, we are li­able to en­counter the on­tol­ogy iden­ti­fi­ca­tion prob­lem long be­fore the agent starts dis­cov­er­ing new physics.

Dis­cus­sion of the gen­er­al­ized on­tol­ogy iden­ti­fi­ca­tion problem

If we don’t know how to solve the on­tol­ogy iden­ti­fi­ca­tion prob­lem for max­i­miz­ing di­a­monds, we prob­a­bly can’t solve it for much more com­pli­cated val­ues over uni­verse-his­to­ries.

View of hu­man angst as on­tol­ogy iden­ti­fi­ca­tion problem

Ar­gu­ment: A hu­man be­ing who feels angst on con­tem­plat­ing a uni­verse in which “By con­ven­tion sweet­ness, by con­ven­tion bit­ter­ness, by con­ven­tion color, in re­al­ity only atoms and the void” (Dem­ocri­tus), or won­ders where there is any room in this cold atomic uni­verse for love, free will, or even the ex­is­tence of peo­ple—since, af­ter all, peo­ple are just mere col­lec­tions of atoms—can be seen as un­der­go­ing an on­tol­ogy iden­ti­fi­ca­tion prob­lem: they don’t know how to find the ob­jects of value in a rep­re­sen­ta­tion con­tain­ing atoms in­stead of on­tolog­i­cally ba­sic peo­ple.

Hu­man be­ings si­mul­ta­neously evolved a par­tic­u­lar set of stan­dard men­tal rep­re­sen­ta­tions (e.g., a rep­re­sen­ta­tion for col­ors in terms of a 3-di­men­sional sub­jec­tive color space, a rep­re­sen­ta­tion for other hu­mans that simu­lates their brain via em­pa­thy) along with evolv­ing de­sires that bind to these rep­re­sen­ta­tions (iden­ti­fi­ca­tion of flow­er­ing land­scapes as beau­tiful, a prefer­ence not to be em­bar­rassed in front of other ob­jects des­ig­nated as peo­ple). When some­one vi­su­al­izes any par­tic­u­lar con­figu­ra­tions of ‘mere atoms’, their built-in de­sires don’t au­to­mat­i­cally fire and bind to that men­tal rep­re­sen­ta­tion, the way they would bind to the brain’s na­tive rep­re­sen­ta­tion of other peo­ple. Gen­er­al­iz­ing that no set of atoms can be mean­ingful, and be­ing told that re­al­ity is com­posed en­tirely of such atoms, they feel they’ve been told that the true state of re­al­ity, un­der­ly­ing ap­pear­ances, is a mean­ingless one.

Ar­guably, this is struc­turally similar to a util­ity func­tion so defined as to bind only to true di­a­monds made of on­tolog­i­cally ba­sic car­bon, which eval­u­ates as unim­por­tant any di­a­mond that turns out to be made of mere pro­tons and neu­trons.

On­tol­ogy iden­ti­fi­ca­tion prob­lems may reap­pear on the re­flec­tive level

An ob­vi­ous thought (es­pe­cially for on­line ge­nies) is that if the AI is un­sure about how to rein­ter­pret its goals in light of a shift­ing men­tal rep­re­sen­ta­tion, it should query the pro­gram­mers.

Since the defi­ni­tion of a pro­gram­mer would then it­self be baked into the prefer­ence frame­work, the prob­lem might re­pro­duce it­self on the re­flec­tive level if the AI be­came un­sure of where to find pro­gram­mers. (“My prefer­ence frame­work said that pro­gram­mers were made of car­bon atoms, but all I can find in this uni­verse are quan­tum fields.”)

Value lad­ing in cat­e­gory boundaries

Tak­ing apart ob­jects of value into smaller com­po­nents can some­times cre­ate new moral edge cases. In this sense, re­bind­ing the terms of a util­ity func­tion de­cides a value-laden ques­tion.

Con­sider chim­panzees. One way of view­ing ques­tions like “Is a chim­panzee truly a per­son?”—mean­ing, not, “How do we ar­bi­trar­ily define the syl­la­bles per-son?” but “Should we care a lot about chim­panzees?”—is that they’re about how to ap­ply the ‘per­son’ cat­e­gory in our de­sires to things that are nei­ther typ­i­cal peo­ple nor typ­i­cal non­peo­ple. We can see this as aris­ing from some­thing like an on­tolog­i­cal shift: we’re used to valu­ing cog­ni­tive sys­tems that are made from whole hu­man minds, but it turns out that minds are made of parts, and then we have the ques­tion of how to value things that are made from some of the per­son-parts but not all of them.

Redefin­ing the value-laden cat­e­gory ‘per­son’ so that it talked about brains made out of neu­ral re­gions, rather than whole hu­man be­ings, would im­plic­itly say whether or not a chim­panzee was a per­son. Chim­panzees definitely have neu­ral ar­eas of var­i­ous sizes, and par­tic­u­lar cog­ni­tive abil­ities—we can sup­pose the em­piri­cal truth is un­am­bigu­ous at this level, and known to us. So the ques­tion is then whether we re­gard a par­tic­u­lar con­figu­ra­tion of neu­ral parts (a frontal cor­tex of a cer­tain size) and par­tic­u­lar cog­ni­tive abil­ities (con­se­quen­tial­ist means-end rea­son­ing and em­pa­thy, but no re­cur­sive lan­guage) as some­thing that our ‘per­son’ cat­e­gory val­ues… once we’ve rewrit­ten the per­son cat­e­gory to value con­figu­ra­tions of cog­ni­tive parts, rather than whole atomic peo­ple.

In this sense the prob­lem we face with chim­panzees is ex­actly analo­gous to the ques­tion a di­a­mond max­i­mizer would face af­ter dis­cov­er­ing nu­clear physics and ask­ing it­self whether a car­bon-14 atom counted as ‘car­bon’ for pur­poses of car­ing about di­a­monds. Once a di­a­mond max­i­mizer knows about neu­trons, it can see that C-14 is chem­i­cally like car­bon and forms the same kind of chem­i­cal bonds, but that it’s heav­ier be­cause it has two ex­tra neu­trons. We can see that chim­panzees have a similar brain ar­chi­tec­tures to the sort of peo­ple we always con­sid­ered be­fore, but that they have smaller frontal cor­texes and no abil­ity to use re­cur­sive lan­guage, etcetera.

Without know­ing more about the di­a­mond max­i­mizer, we can’t guess what sort of con­sid­er­a­tions it might bring to bear in de­cid­ing what is Truly Car­bon and Really A Di­a­mond. But the breadth of con­sid­er­a­tions hu­man be­ings need to in­voke in de­cid­ing how much to care about chim­panzees, is one way of illus­trat­ing that the prob­lem of re­bind­ing a util­ity func­tion to a shifted on­tol­ogy is value-laden and po­ten­tially un­dergo ex­cur­sions into ar­bi­trar­ily com­pli­cated desider­ata. Redefin­ing a moral cat­e­gory so that it talks about the un­der­ly­ing parts of what were pre­vi­ously seen as all-or-noth­ing atomic ob­jects, may carry an im­plicit rul­ing about how to value many kinds of edge case ob­jects that were never seen be­fore.

A for­mal part of this prob­lem may need to be carved out from the edge-case-re­clas­sifi­ca­tion part: e.g., how would you re­define car­bon as C12 if there were no other iso­topes, or how would you re­bind the util­ity func­tion to at least C12, or how would edge cases be iden­ti­fied and queried.

Po­ten­tial re­search avenues

‘Trans­par­ent pri­ors’ con­strained to mean­ingful but Tur­ing-com­plete hy­poth­e­sis spaces

The rea­son why we can’t bind a de­scrip­tion of ‘di­a­mond’ or ‘car­bon atoms’ to the hy­poth­e­sis space used by AIXI or AIXI-tl is that the hy­poth­e­sis space of AIXI is all Tur­ing ma­chines that pro­duce bi­nary strings, or prob­a­bil­ity dis­tri­bu­tions over the next sense bit given pre­vi­ous sense bits and mo­tor in­put. Th­ese Tur­ing ma­chines could con­tain an uni­mag­in­ably wide range of pos­si­ble contents

(Ex­am­ple: Maybe one Tur­ing ma­chine that is pro­duc­ing good se­quence pre­dic­tions in­side AIXI, ac­tu­ally does so by simu­lat­ing a large uni­verse, iden­ti­fy­ing a su­per­in­tel­li­gent civ­i­liza­tion that evolves in­side that uni­verse, and mo­ti­vat­ing that civ­i­liza­tion to try to in­tel­li­gently pre­dict fu­ture fu­ture bits from past bits (as pro­vided by some in­ter­ven­tion). To write a for­mal util­ity func­tion that could ex­tract the ‘amount of real di­a­mond in the en­vi­ron­ment’ from ar­bi­trary pre­dic­tors in the above case , we’d need the func­tion to read the Tur­ing ma­chine, de­code that uni­verse, find the su­per­in­tel­li­gence, de­code the su­per­in­tel­li­gence’s thought pro­cesses, find the con­cept (if any) re­sem­bling ‘di­a­mond’, and hope that the su­per­in­tel­li­gence had pre­calcu­lated how much di­a­mond was around in the outer uni­verse be­ing ma­nipu­lated by AIXI.)

This sug­gests that to solve the on­tol­ogy iden­ti­fi­ca­tion prob­lem, we may need to con­strain the hy­poth­e­sis space to some­thing less gen­eral than ‘an ex­pla­na­tion is any com­puter pro­gram that out­puts a prob­a­bil­ity dis­tri­bu­tion on sense bits’. A con­strained ex­pla­na­tion space can still be Tur­ing com­plete (con­tain a pos­si­ble ex­pla­na­tion for ev­ery com­putable sense in­put se­quence) with­out ev­ery pos­si­ble com­puter pro­gram con­sti­tut­ing an ex­pla­na­tion.

An un­re­al­is­tic ex­am­ple would be to con­strain the hy­poth­e­sis space to Dy­namic Bayesian Net­works. DBNs can rep­re­sent any Tur­ing ma­chine with bounded mem­ory,Not sure where to look for a cita­tion, but I’d be very sur­prised if this wasn’t true. so they are very gen­eral, but since a DBN is a causal model, they make it pos­si­ble for a prefer­ence frame­work to talk about ‘the cause of a pic­ture of a di­a­mond’ in a way that you couldn’t look for ‘the cause of a pic­ture of a di­a­mond’ in­side a gen­eral Tur­ing ma­chine. Again, this might fail if the DBN has no ‘nat­u­ral’ way of rep­re­sent­ing the en­vi­ron­ment ex­cept as a DBN simu­lat­ing some other pro­gram that simu­lates the en­vi­ron­ment.

Sup­pose a rich causal lan­guage, such as, e.g., a dy­namic sys­tem of ob­jects with causal re­la­tions and hi­er­ar­chi­cal cat­e­gories of similar­ity. The hope is that in this lan­guage, the nat­u­ral hy­poth­e­sis rep­re­sent­ing the en­vi­ron­ment—the sim­plest hy­pothe­ses within this lan­guage that well pre­dict the sense data, or those hy­pothe­ses of high­est prob­a­bil­ity un­der some sim­plic­ity prior af­ter up­dat­ing on the sense data—would be such that there was a nat­u­ral ‘di­a­mond’ cat­e­gory in­side the most prob­a­ble causal mod­els. In other words, the win­ning hy­poth­e­sis for ex­plain­ing the uni­verse would already have pos­tu­lated di­a­mond­ness as a nat­u­ral cat­e­gory and rep­re­sented it as Cat­e­gory #803,844, in a rich lan­guage where we already know how to look through the en­vi­ro­men­tal model and find the list of cat­e­gories.

Given some trans­par­ent prior, there would then ex­ist the fur­ther prob­lem of de­vel­op­ing a util­ity-iden­ti­fy­ing prefer­ence frame­work that could look through the most likely en­vi­ron­men­tal rep­re­sen­ta­tions and iden­tify di­a­monds. Some likely (in­ter­act­ing) ways of bind­ing would be, e.g., to “the causes of pic­tures of di­a­monds”, to “things that are bound to four similar things”, query­ing am­bi­gui­ties to pro­gram­mers, or di­rect pro­gram­mer in­spec­tion of the AI’s model (but in this case the pro­gram­mers might need to re-in­spect af­ter each on­tolog­i­cal shift). See be­low.

(A bounded value load­ing method­ol­ogy would also need some way of turn­ing the bound prefer­ence frame­work into the es­ti­ma­tion pro­ce­dures for ex­pected di­a­mond and the agent’s search pro­ce­dures for strate­gies high in ex­pected di­a­mond, i.e., the bulk of the ac­tual AI that car­ries out the goal op­ti­miza­tion.)

Match­ing en­vi­ron­men­tal cat­e­gories to de­scrip­tive constraints

Given some trans­par­ent prior, there would ex­ist a fur­ther prob­lem of how to ac­tu­ally bind a prefer­ence frame­work to that prior. One pos­si­ble con­tribut­ing method for pin­point­ing an en­vi­ron­men­tal prop­erty could be if we un­der­stand the prior well enough to un­der­stand what the de­scribed ob­ject ought to look like—the equiv­a­lent of be­ing able to search for ‘things W made of six smaller things X near six smaller things Y and six smaller things Z, that are bound by shared Xs to four similar things W in a tetra­he­dral struc­ture’ in or­der to iden­tify car­bon atoms and di­a­mond.

We would need to un­der­stand the rep­re­sen­ta­tion well enough to make a guess about how car­bon or di­a­mond would be rep­re­sented in­side it. But if we could guess that, we could write a pro­gram that iden­ti­fies ‘di­a­mond’ in­side the hy­poth­e­sis space with­out need­ing to know in ad­vance that di­a­mond­ness will be Cat­e­gory #823,034. Then we could re­run the same util­ity-iden­ti­fi­ca­tion pro­gram when the rep­re­sen­ta­tion up­dates, so long as this pro­gram can re­li­ably iden­tify di­a­mond in­side the model each time, and the agent acts so as to op­ti­mize the util­ity iden­ti­fied by the pro­gram.

One par­tic­u­lar class of ob­jects that might plau­si­bly be iden­ti­fi­able in this way is ‘the AI’s pro­gram­mers’ (aka the agents that are causes of the AI’s code) if there are parts of the prefer­ence frame­work that say to query pro­gram­mers to re­solve am­bi­gui­ties.

A toy prob­lem for this re­search av­enue might in­volve:

  • One of the richer rep­re­sen­ta­tion frame­works that can be in­ducted as of the time, e.g., a sim­ple Dy­namic Bayes Net.

  • An agent en­vi­ron­ment that can be thus rep­re­sented.

  • A goal over prop­er­ties rel­a­tively dis­tant from the agent’s sen­sory ex­pe­rience (e.g., the goal is over the cause of the cause of the sen­sory data).

  • A pro­gram that iden­ti­fies the ob­jects of util­ity in the en­vi­ron­ment, within the model thus freely in­ducted.

  • An agent that op­ti­mizes the iden­ti­fied ob­jects of util­ity, once it has in­ducted a suffi­ciently good model of the en­vi­ron­ment to op­ti­mize what it is look­ing for.

Fur­ther work might add:

  • New in­for­ma­tion that can change the model of the en­vi­ron­ment.

  • An agent that smoothly up­dates what it op­ti­mizes for in this case.

And fur­ther:

  • En­vi­ron­ments com­pli­cated enough that there is real struc­tural am­bi­guity (e.g., de­pen­dence on ex­act ini­tial con­di­tions of the in­fer­ence pro­gram) about how ex­actly the util­ity-re­lated parts are mod­eled.

  • Agents that can op­ti­mize through a prob­a­bil­ity dis­tri­bu­tion about en­vi­ron­ments that differ in their iden­ti­fied ob­jects of util­ity.

A po­ten­tial agenda for un­bounded anal­y­sis might be:

  • An un­bounded anal­y­sis show­ing that a util­ity-iden­ti­fy­ing prefer­ence frame­work is a gen­er­al­iza­tion of a VNM util­ity and can tile in an ar­chi­tec­ture that tiles a generic util­ity func­tion.

  • A Cor­rigi­bil­ity anal­y­sis show­ing that an agent is not mo­ti­vated to try to cause the uni­verse to be such as to have util­ity iden­ti­fied in a par­tic­u­lar way.

  • A Cor­rigi­bil­ity anal­y­sis show­ing that the iden­tity and cat­e­gory bound­aries of the ob­jects of util­ity will be treated as a his­tor­i­cal fact rather than one ly­ing in the agent’s de­ci­sion-the­o­retic fu­ture.

Iden­ti­fy­ing en­vi­ron­men­tal cat­e­gories as the causes of la­beled sense data.

Another po­ten­tial ap­proach, given a prior trans­par­ent enough that we can find causal data in­side it, would be to try to iden­tify di­a­monds as the causes of pic­tures of di­a­monds.


Se­cu­rity note

Chris­ti­ano’s hack: if your AI is ad­vanced enough to model dis­tant su­per­in­tel­li­gences, it’s im­por­tant to note that dis­tant su­per­in­tel­li­gences can make ‘the most prob­a­ble cause of the AI’s sen­sory data’ be any­thing they want by mak­ing a pre­dictable de­ci­sion to simu­late AIs such that your AI doesn’t have info dis­t­in­guish­ing it­self from the dis­tant AIs your AI imag­ines be­ing simulated

Am­bi­guity resolution

Both the de­scrip­tion-match­ing and cause-in­fer­ring meth­ods might pro­duce am­bi­gui­ties. Rather than hav­ing the AI op­ti­mize for a prob­a­bil­is­tic mix over all the matches (as if it were un­cer­tain of which match were the true one), it would be bet­ter to query the am­bi­guity to the pro­gram­mers (es­pe­cially if differ­ent prob­a­ble mod­els im­ply differ­ent strate­gies). This prob­lem shares struc­ture with in­duc­tive in­fer­ence with am­bi­guity re­s­olu­tion as a strat­egy for re­solv­ing un­fore­seen in­duc­tions.

if you try to solve the re­flec­tive prob­lem by defin­ing the queries in terms of sense data, you might run into Carte­sian prob­lems. if you try to on­tolog­i­cally iden­tify the pro­gram­mers in terms more gen­eral than a par­tic­u­lar we­b­cam, so that the AI can have new we­b­cams, the on­tol­ogy iden­ti­fi­ca­tion prob­lem might re­pro­duce it­self on the re­flec­tive level. you have to note it down as a de­pen­dency ei­ther way.

Multi-level maps

Be­ing able to de­scribe, in purely the­o­ret­i­cal prin­ci­ple, a prior over epistemic mod­els that have at least two lev­els and can switch be­tween them in some mean­ingful sense, would con­sti­tute ma­jor progress over the pre­sent state of the art.

try this with just two level. half ad­ders as po­ten­tial mod­els? re­quire­ments: that the lower level be only par­tially re­al­ized rather than need­ing to be fully mod­eled; that it can de­scribe prob­a­bil­is­tic things; that we can have a lan­guage for things like this and prior over them that gets up­dated on the ev­i­dence, rather than just a par­tic­u­lar hand­crafted two-level map.


the pro­gram­mers can read through up­dates to the AI’s rep­re­sen­ta­tion fast enough, or if most of the rou­tine ones leave cer­tain lev­els in­tact or im­ply a defined re­la­tion be­tween old and new mod­els, then it might be pos­si­ble to solve this prob­lem pro­gram­mat­i­cally for ge­nies. es­pe­cially if it’s a non­re­cur­sive ge­nie with known al­gorithms, be­cause then it might have a known rep­re­sen­ta­tion that might be known not to change sud­denly, and be cor­rigible-by-de­fault while the rep­re­sen­ta­tion is be­ing worked out. so this is one of the prob­lems more likely to be averted in prac­tice but un­der­stand­ing it does help to see one more rea­son why You Can­not Just Hard­code the Utility Func­tion By Hand.

Hard to solve en­tire prob­lem be­cause it has at least some en­tan­gle­ment with the full AGI prob­lem.

The prob­lem of us­ing sen­sory data to build com­pu­ta­tion­ally effi­cient prob­a­bil­is­tic maps of the world, and to effi­ciently search for ac­tions that are pre­dicted by those maps to have par­tic­u­lar con­se­quences, could be iden­ti­fied with the en­tire prob­lem of AGI. So the re­search goal of on­tol­ogy iden­ti­fi­ca­tion is not to pub­lish a com­plete bounded sys­tem like that (i.e. an AGI), but to de­velop an un­bounded anal­y­sis of util­ity re­bind­ing that seems to say some­thing use­ful speci­fi­cally about the on­tol­ogy-iden­ti­fi­ca­tion part of the prob­lem.)