Ontology identification problem: Technical tutorial

The prob­lem of on­tol­ogy iden­ti­fi­ca­tion is the prob­lem of load­ing a goal into an ad­vanced agent when that agent’s rep­re­sen­ta­tion of the world is likely to change in ways un­fore­seen in the de­vel­op­ment phase. This tu­to­rial fo­cuses pri­mar­ily on ex­plain­ing what the prob­lem is and why it is a fore­see­able difficulty; for the cor­re­spond­ing re­search prob­lems, see the main page on On­tol­ogy Iden­ti­fi­ca­tion.

This is a tech­ni­cal tu­to­rial, mean­ing that it as­sumes some fa­mil­iar­ity with value al­ign­ment the­ory, the value iden­ti­fi­ca­tion prob­lem, and safety think­ing for ad­vanced agents.

To iso­late on­tol­ogy iden­ti­fi­ca­tion from other parts of the value iden­ti­fi­ca­tion prob­lem, we con­sider a sim­plified but still very difficult prob­lem: to state an un­bounded pro­gram im­ple­ment­ing a di­a­mond max­i­mizer that will turn as much of the phys­i­cal uni­verse into di­a­mond as pos­si­ble. The goal of “mak­ing di­a­monds” was cho­sen to have a crisp-seem­ing defi­ni­tion for our uni­verse: namely, the amount of di­a­mond is the num­ber of car­bon atoms co­va­lently bound to four other car­bon atoms. Since it seems that in this case our in­tended goal should be crisply defin­able rel­a­tive to our uni­verse’s physics, we can avert many other is­sues of try­ing to iden­tify com­plex val­ues to the agent. On­tol­ogy iden­ti­fi­ca­tion is a difficulty that still re­mains even in this case—the agent’s rep­re­sen­ta­tion of ‘car­bon atoms’ may still change over time.

In­tro­duc­tion: Two sources of rep­re­sen­ta­tional unpredictability

Sup­pose we wanted to write a hand-coded, ob­ject-level util­ity func­tion that eval­u­ated the amount of di­a­mond ma­te­rial pre­sent in the AI’s model of the world. We might fore­see the fol­low­ing two difficul­ties:

  1. Where ex­actly do I find ‘car­bon atoms’ in­side the AI’s model of the world? As the pro­gram­mer, all I see are these mys­te­ri­ous ones and ze­roes, and the only parts that di­rectly cor­re­spond to events I un­der­stand is the rep­re­sen­tion of the pix­els in the AI’s we­b­cam… maybe I can figure out where the ‘car­bon’ con­cept is by show­ing the AI graphite, buck­y­tubes, and a di­a­mond on its we­b­cam and see­ing what parts get ac­ti­vated… whoops, looks like the AI just re­vised its in­ter­nal rep­re­sen­ta­tion to be more com­pu­ta­tion­ally effi­cient, now I once again have no idea what ‘car­bon’ looks like in there. How can I make my hand-coded util­ity func­tion re-bind it­self to ‘car­bon’ each time the AI re­vises its model’s rep­re­sen­ta­tion of the world?

  2. What ex­actly is ‘di­a­mond’? If you say it’s a nu­cleus with six pro­tons, what’s a pro­ton? If you define a pro­ton as be­ing made of quarks, what if there are un­known other par­ti­cles un­der­ly­ing quarks? What if the Stan­dard Model of physics is in­com­plete or wrong—can we state ex­actly and for­mally what con­sti­tutes a car­bon atom when we aren’t cer­tain what the un­der­ly­ing quarks are made of?

Difficulty 2 prob­a­bly seems more ex­otic than the first, but Difficulty 2 is eas­ier to ex­plain in a for­mal sense and turns out to be a sim­pler way to illus­trate many of the key is­sues that also ap­pear in Difficulty 1. We can see Difficulty 2 as the prob­lem of bind­ing an in­tended goal to an un­known ter­ri­tory, and Difficulty 1 as the prob­lem of bind­ing an in­tended goal to an un­known map. So the first step of the tu­to­rial will be to walk through how Difficulty 2 (what ex­actly is a di­a­mond?) might re­sult in weird be­hav­ior in an un­bounded agent in­tended to be a di­a­mond max­i­mizer.

Try 1: Hack­ing AIXI to max­i­mize di­a­monds?

The clas­sic un­bounded agent—an agent us­ing far more com­put­ing power than the size of its en­vi­ron­ment—is AIXI. Roughly speak­ing, AIXI con­sid­ers all com­putable hy­pothe­ses for how its en­vi­ron­ment might work—all pos­si­ble Tur­ing ma­chines that would turn AIXI’s out­puts into AIXI’s fu­ture in­puts. (The finite var­i­ant AIXI-tl has a hy­poth­e­sis space that in­cludes all Tur­ing ma­chines that can be speci­fied us­ing fewer than \(l\) bits and run in less than time \(t\).)

From the per­spec­tive of AIXI, any Tur­ing ma­chine that takes one in­put tape and pro­duces two out­put tapes is a “hy­poth­e­sis about the en­vi­ron­ment”, where the in­put to the Tur­ing ma­chine en­codes AIXI’s hy­po­thet­i­cal ac­tion, and the out­puts are in­ter­preted as a pre­dic­tion about AIXI’s sen­sory data and AIXI’s re­ward sig­nal. (In Mar­cus Hut­ter’s for­mal­ism, the agent’s re­ward is a sep­a­rate sen­sory in­put to the agent, so hy­pothe­ses about the en­vi­ron­ment also make pre­dic­tions about sensed re­wards). AIXI then be­haves as a Bayesian pre­dic­tor that uses al­gorith­mic com­plex­ity to give higher prior prob­a­bil­ities to sim­pler hy­pothe­ses (that is, Tur­ing ma­chines with fewer states and smaller state tran­si­tion di­a­grams), and up­dates its mix of hy­pothe­ses based on sen­sory ev­i­dence (which can con­firm or dis­con­firm the pre­dic­tions of par­tic­u­lar Tur­ing ma­chines).

As a de­ci­sion agent, AIXI always out­puts the mo­tor ac­tion that leads to the high­est pre­dicted re­ward, as­sum­ing that the en­vi­ron­ment is de­scribed by the up­dated prob­a­bil­ity mix­ture of all Tur­ing ma­chines that could rep­re­sent the en­vi­ron­ment (and as­sum­ing that fu­ture iter­a­tions of AIXI up­date and choose similarly).

The on­tol­ogy iden­ti­fi­ca­tion prob­lem shows up sharply when we imag­ine try­ing to mod­ify AIXI to “max­i­mize ex­pec­ta­tions of di­a­monds in the out­side en­vi­ron­ment” rather than “max­i­mize ex­pec­ta­tions of sen­sory re­ward sig­nals”. As a Carte­sian agent, AIXI has sharply defined sen­sory in­puts and mo­tor out­puts, so we can have a prob­a­bil­ity mix­ture over all Tur­ing ma­chines that re­late mo­tor out­puts to sense in­puts (as crisply rep­re­sented in the in­put and out­put tapes). But even if some oth­er­wise ar­bi­trary Tur­ing ma­chine hap­pens to pre­dict sen­sory ex­pe­riences ex­tremely well, how do we look at the state and work­ing tape of that Tur­ing ma­chine to eval­u­ate ‘the amount of di­a­mond’ or ‘the es­ti­mated num­ber of car­bon atoms bound to four other car­bon atoms’? The high­est-weighted Tur­ing ma­chines that have best pre­dicted the sen­sory data so far, pre­sum­ably con­tain some sort of rep­re­sen­ta­tion of the en­vi­ron­ment, but we have no idea how to get ‘the num­ber of di­a­monds’ out of it.

(Ex­am­ple: Maybe one Tur­ing ma­chine that is pro­duc­ing good se­quence pre­dic­tions in­side AIXI, ac­tu­ally does so by simu­lat­ing a large uni­verse, iden­ti­fy­ing a su­per­in­tel­li­gent civ­i­liza­tion that evolves in­side that uni­verse, and mo­ti­vat­ing that civ­i­liza­tion to try to in­tel­li­gently pre­dict fu­ture fu­ture bits from past bits (as pro­vided by some in­ter­ven­tion). To write a for­mal util­ity func­tion that could ex­tract the ‘amount of real di­a­mond in the en­vi­ron­ment’ from ar­bi­trary pre­dic­tors in the above case , we’d need the func­tion to read the Tur­ing ma­chine, de­code that uni­verse, find the su­per­in­tel­li­gence, de­code the su­per­in­tel­li­gence’s thought pro­cesses, find the con­cept (if any) re­sem­bling ‘di­a­mond’, and hope that the su­per­in­tel­li­gence had pre­calcu­lated how much di­a­mond was around in the outer uni­verse be­ing ma­nipu­lated by AIXI.)

This is, in gen­eral, the rea­son why the AIXI fam­ily of ar­chi­tec­tures can only con­tain agents defined to max­i­mize di­rect func­tions of their sen­sory in­put, and not agents that be­have so as to op­ti­mize facts about their ex­ter­nal en­vi­ron­ment. (We can’t make AIXI max­i­mize di­a­monds by mak­ing it want pic­tures of di­a­monds be­cause then it will just, e.g., build an en­vi­ron­men­tal sub­agent that seizes con­trol of AIXI’s we­b­cam and shows it pic­tures of di­a­monds. If you ask AIXI to show it­self sen­sory pic­tures of di­a­monds, you can get it to show its we­b­cam lots of pic­tures of di­a­monds, but this is not the same thing as build­ing an en­vi­ron­men­tal di­a­mond max­i­mizer.)

Try 2: Un­bounded agent us­ing clas­si­cal atomic hy­pothe­ses?

Given the ori­gins of the above difficulty, we next imag­ine con­strain­ing the agent’s hy­poth­e­sis space to some­thing other than “liter­ally all com­putable func­tions from mo­tor out­puts to sense in­puts”, so that we can figure out how to find di­a­monds or car­bon in­side the agent’s rep­re­sen­ta­tion of the world.

As an un­re­al­is­tic ex­am­ple: Sup­pose some­one was try­ing to define ‘di­a­monds’ to the AI’s util­ity func­tion. Sup­pose they knew about atomic physics but not nu­clear physics. Sup­pose they build an AI which, dur­ing its de­vel­op­ment phase, learns about atomic physics from the pro­gram­mers, and thus builds a world-model that is based on atomic physics.

Again for pur­poses of un­re­al­is­tic ex­am­ples, sup­pose that the AI’s world-model is en­coded in such fash­ion that when the AI imag­ines a molec­u­lar struc­ture—rep­re­sents a men­tal image of some molecules—then car­bon atoms are rep­re­sented as a par­tic­u­lar kind of ba­sic el­e­ment of the rep­re­sen­ta­tion. Again, as an un­re­al­is­tic ex­am­ple, imag­ine that there are lit­tle LISP to­kens rep­re­sent­ing en­vi­ron­men­tal ob­jects, and that the en­vi­ron­men­tal-ob­ject-type of car­bon-ob­jects is en­coded by the in­te­ger 6. Imag­ine also that each atom, in­side this rep­re­sen­ta­tion, is fol­lowed by a list of the other atoms to which it’s co­va­lently bound. Then when the AI is imag­in­ing a car­bon atom par­ti­ci­pat­ing in a di­a­mond, in­side the rep­re­sen­ta­tion we would see an ob­ject of type 6, fol­lowed by a list con­tain­ing ex­actly four other 6-ob­jects.

Can we fix this rep­re­sen­ta­tion for all hy­pothe­ses, and then write a util­ity func­tion for the AI that counts the num­ber of type-6 ob­jects that are bound to ex­actly four other type-6 ob­jects? And if we did so, would the re­sult ac­tu­ally be a di­a­mond max­i­mizer?


As a first ap­proach to im­ple­ment­ing this idea—an agent whose hy­poth­e­sis space is con­strained to mod­els that di­rectly rep­re­sent all the car­bon atoms—imag­ine a var­i­ant of AIXI-tl that, rather than con­sid­er­ing all tl-bounded Tur­ing ma­chines, con­sid­ers all simu­lated atomic uni­verses con­tain­ing up to 10^100 par­ti­cles spread out over up to 10^50 light-years. In other words, the agent’s hy­pothe­ses are uni­verse-sized simu­la­tions of clas­si­cal, pre-nu­clear mod­els of physics; and these simu­la­tions are con­strained to a com­mon rep­re­sen­ta­tion, so a fixed util­ity func­tion can look at the rep­re­sen­ta­tion and count car­bon atoms bound to four other car­bon atoms. Call this agent AIXI-atomic.

(Note that AIXI-atomic, as an un­bounded agent, may use far more com­put­ing power than is em­bod­ied in its en­vi­ron­ment. For pur­poses of the thought ex­per­i­ment, as­sume that the uni­verse con­tains ex­actly one hy­per­com­puter that runs AIXI-atomic.)

A first difficulty is that uni­verses com­posed only of clas­si­cal atoms are not good ex­pla­na­tions of our own uni­verse, even in terms of sur­face phe­nom­ena; e.g. the ul­tra­vi­o­let catas­tro­phe. So let it be sup­posed that we have simu­la­tion rules for clas­si­cal physics that repli­cate at least what­ever phe­nom­ena the pro­gram­mers have ob­served at de­vel­op­ment time, even if the rules have some seem­ingly ad-hoc el­e­ments (like there be­ing no ul­tra­vi­o­lent catas­tro­phes). We will not how­ever sup­pose that the pro­gram­mers have dis­cov­ered all ex­per­i­men­tal phe­nom­ena we now see as point­ing to nu­clear or quan­tum physics.

A sec­ond difficulty is that a simu­lated uni­verse of clas­si­cal atoms does not iden­tify where in the uni­verse the AIXI-atomic agent re­sides, or say how to match the types of AIXI-atomic’s sense in­puts with the un­der­ly­ing be­hav­iors of atoms. We can elide this difficulty by imag­in­ing that AIXI-atomic simu­lates clas­si­cal uni­verses con­tain­ing a sin­gle hy­per­com­puter, and that AIXI-atomic knows a sim­ple func­tion from each simu­lated uni­verse onto its own sen­sory data (e.g., it knows to look at the simu­lated uni­verse, and trans­late simu­lated pho­tons im­p­ing­ing on its we­b­cam onto pre­dicted we­b­cam data in the stan­dard for­mat). This elides most of the prob­lem of nat­u­ral­ized in­duc­tion.

So the AIXI-atomic agent that is hoped to max­i­mize di­a­mond:

  • Con­sid­ers only hy­pothe­ses that di­rectly rep­re­sent uni­verses as huge sys­tems of clas­si­cal atoms, so that the func­tion ‘count atoms bound to four other car­bon atoms’ can be di­rectly run over any pos­si­ble fu­ture the agent mod­els.

  • As­signs prob­a­bil­is­tic pri­ors over these pos­si­ble atomic rep­re­sen­ta­tions of the uni­verse, fa­vor­ing rep­re­sen­ta­tions that are in some sense sim­pler.

  • Some­how maps each atomic-level rep­re­sen­ta­tion onto the agent’s pre­dicted sen­sory ex­pe­riences.

  • Bayes-up­dates its pri­ors based on ac­tual sen­sory ex­pe­riences, the same as clas­si­cal AIXI.

  • Can eval­u­ate the ‘ex­pected di­a­mond­ness on the next turn’ of a sin­gle ac­tion by look­ing at all hy­po­thet­i­cal uni­verses where that ac­tion is performed, weighted by their cur­rent prob­a­bil­ity, and sum­ming over the ex­pec­ta­tion of ‘car­bon atoms bound to four other car­bon atoms’ af­ter some unit amount of time has passed.

  • Can eval­u­ate the ‘fu­ture ex­pected di­a­mond­ness’ of an ac­tion, over some finite time hori­zon, by as­sum­ing that its fu­ture self will also Bayes-up­date and max­i­mize ex­pected di­a­mond­ness over that time hori­zon.

  • On each turn, out­puts the ac­tion with great­est ex­pected di­a­mond­ness over some finite time hori­zon.

Sup­pose our own real uni­verse was amended to oth­er­wise be ex­actly the same, but con­tain a sin­gle im­per­me­able hy­per­com­puter. Sup­pose we defined an agent like the one above, us­ing simu­la­tions of 1910-era mod­els of physics, and ran that agent on the hy­per­com­puter. Should we ex­pect the re­sult to be an ac­tual di­a­mond max­i­mizer—ex­pect that the out­come of run­ning this pro­gram on a sin­gle hy­per­com­puter would in­deed be that most mass in our uni­verse would be turned into car­bon and ar­ranged into di­a­monds?

An­ti­ci­pated failure: AIXI-atomic tries to ‘max­i­mize out­side the simu­la­tion’

In fact, our own uni­verse isn’t atomic, it’s nu­clear and quan­tum-me­chan­i­cal. This means that AIXI-atomic does not con­tain any hy­pothe­ses in its hy­poth­e­sis space that di­rectly rep­re­sent our uni­verse. By the pre­vi­ously speci­fied hy­poth­e­sis of the thought ex­per­i­ment, AIXI-atomic’s model of simu­lated physics was built to en­com­pass all the ex­per­i­men­tal phe­nom­ena the pro­gram­mers had yet dis­cov­ered, but there were some quan­tum and nu­clear phe­nom­ena that AIXI-atomic’s pro­gram­mers had not yet dis­cov­ered. When those phe­nom­ena are dis­cov­ered, there will be no sim­ple ex­pla­na­tion on the di­rect terms of the model.

In­tu­itively, of course, we’d like AIXI-atomic to dis­cover the com­po­si­tion of nu­clei, shift its mod­els to use nu­clear physics, and re­fine the ‘car­bon atoms’ men­tioned in its util­ity func­tion to mean ‘atoms with nu­clei con­tain­ing six pro­tons’.

But we didn’t ac­tu­ally spec­ify that when con­struct­ing the agent (and say­ing how to do it in gen­eral is, so far as we know, hard; in fact it’s the whole on­tol­ogy iden­ti­fi­ca­tion prob­lem). We con­strained the hy­poth­e­sis space to con­tain only uni­verses run­ning on the clas­si­cal physics that the pro­gram­mers knew about. So what hap­pens in­stead?

Prob­a­bly the ‘sim­plest atomic hy­poth­e­sis that fits the facts’ will be an enor­mous atom-based com­puter, simu­lat­ing nu­clear physics and quan­tum physics in or­der to cre­ate a simu­lated non-clas­si­cal uni­verse whose out­puts are ul­ti­mately hooked up to AIXI’s we­b­cam. From our per­spec­tive this hy­poth­e­sis seems silly, but if you re­strict the hy­poth­e­sis space to only clas­si­cal atomic uni­verses, that’s what ends up be­ing the com­pu­ta­tion­ally sim­plest hy­poth­e­sis that pre­dicts, in de­tail, the re­sults of nu­clear and quan­tum ex­per­i­ments.

AIXI-atomic will then try to choose ac­tions so as to max­i­mize the amount of ex­pected di­a­mond in­side the prob­a­ble out­side uni­verses that could con­tain the gi­ant atom-based simu­la­tor of quan­tum physics. It is not ob­vi­ous what sort of be­hav­ior this would im­ply.

Me­taphor for difficulty: AIXI-atomic cares about only fun­da­men­tal carbon

One metaphor­i­cal way of look­ing at the prob­lem is that AIXI-atomic was im­plic­itly defined to care only about di­a­monds made out of on­tolog­i­cally fun­da­men­tal car­bon atoms, not di­a­monds made out of quarks. A prob­a­bil­ity func­tion that as­signs 0 prob­a­bil­ity to all uni­verses made of quarks, and a util­ity func­tion that out­puts a con­stant on all uni­verses made of quarks, yield func­tion­ally iden­ti­cal be­hav­ior. So it is an ex­act metaphor to say that AIXI-atomic only cares about uni­verses with on­tolog­i­cally ba­sic car­bon atoms, given that AIXI-atomic’s hy­poth­e­sis space only con­tains uni­verses with on­tolog­i­cally ba­sic car­bon atoms.

Imag­ine that AIXI-atomic’s hy­poth­e­sis space does con­tain many other uni­verses with other laws of physics, but its hand-coded util­ity func­tion just re­turns 0 on those uni­verses since it can’t find any ‘car­bon atoms’ in­side the model. Since AIXI-atomic only cares about di­a­mond made of fun­da­men­tal car­bon, when AIXI-atomic dis­cov­ers the ex­per­i­men­tal data im­ply­ing that al­most all of its prob­a­bil­ity mass should reside in nu­clear or quan­tum uni­verses in which there were no fun­da­men­tal car­bon atoms, AIXI-atomic stops car­ing about the effect its ac­tions have on the vast ma­jor­ity of prob­a­bil­ity mass in­side its model. In­stead AIXI-atomic tries to max­i­mize in­side the tiny re­main­ing prob­a­bil­ities in which it is in­side a uni­verse with fun­da­men­tal car­bon atoms that is some­how re­pro­duc­ing its sen­sory ex­pe­rience of nu­clei and quan­tum fields… for ex­am­ple, a clas­si­cal atomic uni­verse con­tain­ing a com­puter simu­lat­ing a quan­tum uni­verse and show­ing the re­sults to AIXI-atomic.

From our per­spec­tive, we failed to solve the ‘on­tol­ogy iden­ti­fi­ca­tion prob­lem’ and get the real-world re­sult we in­tended, be­cause we tried to define the agent’s util­ity func­tion over prop­er­ties of a uni­verse made out of atoms, and the real uni­verse turned out to be made of quan­tum fields. This caused the util­ity func­tion to fail to bind to the agent’s rep­re­sen­ta­tion in the way we in­tu­itively had in mind.

To­day we do know about quan­tum me­chan­ics, so if we tried to build a di­a­mond max­i­mizer us­ing some bounded ver­sion of the above for­mula, it might not fail on ac­count of the par­tic­u­lar ex­act prob­lem of atomic physics be­ing false.

But per­haps there are dis­cov­er­ies still re­main­ing that would change our pic­ture of the uni­verse’s on­tol­ogy to im­ply some­thing else un­der­ly­ing quarks or quan­tum fields. Hu­man be­ings have only known about quan­tum fields for less than a cen­tury; our model of the on­tolog­i­cal ba­sics of our uni­verse has been sta­ble for less than a hun­dred years of our hu­man ex­pe­rience. So we should seek an AI de­sign that does not as­sume we know the ex­act, true, fun­da­men­tal on­tol­ogy of our uni­verse dur­ing an AI’s de­vel­op­ment phase.

As an­other im­por­tant metaphor­i­cal case in point, con­sider a hu­man be­ing who feels angst on con­tem­plat­ing a uni­verse in which “By con­ven­tion sweet­ness, by con­ven­tion bit­ter­ness, by con­ven­tion color, in re­al­ity only atoms and the void” (Dem­ocri­tus); some­one who won­ders where there is any room in this col­lec­tion of life­less par­ti­cles for love, free will, or even the ex­is­tence of peo­ple. Since, af­ter all, peo­ple are just mere col­lec­tions of atoms. This per­son can be seen as un­der­go­ing an on­tol­ogy iden­ti­fi­ca­tion prob­lem: they don’t know how to find the ob­jects of value in a rep­re­sen­ta­tion con­tain­ing atoms in­stead of on­tolog­i­cally ba­sic peo­ple.

Hu­man be­ings si­mul­ta­neously evolved a par­tic­u­lar set of stan­dard men­tal rep­re­sen­ta­tions (e.g., a rep­re­sen­ta­tion for col­ors in terms of a 3-di­men­sional sub­jec­tive color space) along with evolv­ing emo­tions that bind to these rep­re­sen­ta­tions (iden­ti­fi­ca­tion of flow­er­ing land­scapes as beau­tiful. When some­one vi­su­al­izes any par­tic­u­lar con­figu­ra­tion of ‘mere atoms’, their built-in de­sires don’t au­to­mat­i­cally fire and bind to that men­tal rep­re­sen­ta­tion, the way they would bind to the brain’s na­tive rep­re­sen­ta­tion of the en­vi­ron­ment. Gen­er­al­iz­ing that no set of atoms can be mean­ingful (since no ab­stract con­figu­ra­tion of ‘mere atoms’ they imag­ine, seems to trig­ger any emo­tions to bind to it) and be­ing told that re­al­ity is com­posed en­tirely of such atoms, they feel they’ve been told that the true state of re­al­ity, un­der­ly­ing ap­pear­ances, is a mean­ingless one.

The util­ity re­bind­ing problem

In­tu­itively, we would think it was com­mon sense for an agent that wanted di­a­monds to re­act to the ex­per­i­men­tal data iden­ti­fy­ing nu­clear physics, by de­cid­ing that a car­bon atom is ‘re­ally’ a nu­cleus con­tain­ing six pro­tons. We can imag­ine this agent com­mon-sen­si­cally up­dat­ing its model of the uni­verse to a nu­clear model, and re­defin­ing the ‘car­bon atoms’ that its old util­ity func­tion counted to mean ‘nu­clei con­tain­ing ex­actly six pro­tons’. Then the new util­ity func­tion could eval­u­ate out­comes in the newly dis­cov­ered nu­clear-physics uni­verse. The prob­lem of pro­duc­ing this de­sir­able agent be­hav­ior is the util­ity re­bind­ing prob­lem.

To see why this prob­lem is non­triv­ial, con­sider that the most com­mon form of car­bon is C-12, with nu­clei com­posed of six pro­tons and six neu­trons. The sec­ond most com­mon form of car­bon is C-14, with nu­clei com­posed of six pro­tons and eight neu­trons. Is C-14 truly car­bon—is it the sort of car­bon that can par­ti­ci­pate in valuable di­a­monds of high util­ity? Well, that de­pends on your util­ity func­tion, ob­vi­ously; and from a hu­man per­spec­tive it just sounds ar­bi­trary.

But con­sider a closely analo­gous ques­tion from a hu­manly im­por­tant per­spec­tive: Is a chim­panzee truly a per­son? Where the ques­tion means not, “How do we ar­bi­trar­ily define the syl­la­bles per-son?” but “Should we care a lot about chim­panzees?”, i.e., how do we define the part of our prefer­ences that care about peo­ple, to the pos­si­bly-per­son edge cases of chim­panzees?

If you live in a world where chim­panzees haven’t been dis­cov­ered, you may have an easy time run­ning your util­ity func­tion over your model of the en­vi­ron­ment, since the ob­jects of your ex­pe­rience clas­sify sharply into the ‘per­son’ and ‘non­per­son’ cat­e­gories. Then you dis­cover chim­panzees, and they’re nei­ther typ­i­cal peo­ple (John Smith) nor typ­i­cal non­peo­ple (like rocks).

We can see the force of this ques­tion as aris­ing from some­thing like an on­tolog­i­cal shift: we’re used to valu­ing cog­ni­tive sys­tems that are made from whole hu­man minds, but it turns out that minds are made of parts, and then we have the ques­tion of how to value things that are made from some of the per­son-parts but not all of them… sort of like the ques­tion of how to treat car­bon atoms that have the usual num­ber of pro­tons but not the usual num­ber of neu­trons.

Chim­panzees definitely have neu­ral ar­eas of var­i­ous sizes, and par­tic­u­lar cog­ni­tive abil­ities—we can sup­pose the em­piri­cal truth is un­am­bigu­ous at this level, and known to us. So the ques­tion is then whether we re­gard a par­tic­u­lar con­figu­ra­tion of neu­ral parts (a frontal cor­tex of a cer­tain size) and par­tic­u­lar cog­ni­tive abil­ities (con­se­quen­tial­ist means-end rea­son­ing and em­pa­thy, but no re­cur­sive lan­guage) as some­thing that our ‘per­son’ cat­e­gory val­ues… once we’ve rewrit­ten the per­son cat­e­gory to value con­figu­ra­tions of cog­ni­tive parts, rather than whole atomic peo­ple.

In fact, we run into this ques­tion as soon as we learn that hu­man be­ings run on brains and the brains are made out of neu­ral re­gions with func­tional prop­er­ties; we can then imag­ine chim­panzees even if we haven’t met any, and ask to what de­gree our prefer­ences should treat this edge-per­son as de­serv­ing of moral rights. If we can ‘re­bind’ our emo­tions and prefer­ences to live in a world of nu­clear brains rather than atomic peo­ple, this re­bind­ing will im­plic­itly say whether or not a chim­panzee is a per­son, de­pend­ing on how our prefer­ence over brain con­figu­ra­tions treats the con­figu­ra­tion that is a chim­panzee.

In this sense the prob­lem we face with chim­panzees is ex­actly analo­gous to the ques­tion a di­a­mond max­i­mizer would face af­ter dis­cov­er­ing nu­clear physics and ask­ing it­self whether a car­bon-14 atom counted as ‘car­bon’ for pur­poses of car­ing about di­a­monds. Once a di­a­mond max­i­mizer knows about neu­trons, it can see that C-14 is chem­i­cally like car­bon and forms the same kind of chem­i­cal bonds, but that it’s heav­ier be­cause it has two ex­tra neu­trons. We can see that chim­panzees have a similar brain ar­chi­tec­tures to the sort of peo­ple we always con­sid­ered be­fore, but that they have smaller frontal cor­texes and no abil­ity to use re­cur­sive lan­guage, etcetera.

Without know­ing more about the di­a­mond max­i­mizer, we can’t guess what sort of con­sid­er­a­tions it might bring to bear in de­cid­ing what is Truly Car­bon and Really A Di­a­mond. But the breadth of con­sid­er­a­tions hu­man be­ings need to in­voke in de­cid­ing how much to care about chim­panzees, is one way of illus­trat­ing that the prob­lem of re­bind­ing a util­ity func­tion to a shifted on­tol­ogy is value-laden and can po­ten­tially un­dergo ex­cur­sions into com­plex desider­ata. Redefin­ing a moral cat­e­gory so that it talks about the un­der­ly­ing parts of what were pre­vi­ously seen as all-or-noth­ing atomic ob­jects, may carry an im­plicit rul­ing about how to value many kinds of edge-case ob­jects that were never seen be­fore.

It’s pos­si­ble that some for­mal part of this prob­lem could be use­fully carved out from the com­plex value-laded edge-case-re­clas­sifi­ca­tion part. E.g., how would you re­define car­bon as C12 if there were no other iso­topes? How would you re­bind the util­ity func­tion to at least C12? In gen­eral, how could edge cases be iden­ti­fied and queried by an on­line Ge­nie?

Reap­pear­ance on the re­flec­tive level

An ob­vi­ous thought (es­pe­cially for on­line Ge­nies) is that if the AI is un­sure about how to rein­ter­pret its goals in light of a shift­ing men­tal rep­re­sen­ta­tion, it should query the pro­gram­mers.

Since the defi­ni­tion of a pro­gram­mer would then it­self be baked into the prefer­ence frame­work, the prob­lem might re­pro­duce it­self on the re­flec­tive level if the AI be­came un­sure of where to find ‘pro­gram­mers’: “My prefer­ence frame­work said that pro­gram­mers were made of car­bon atoms, but all I can find in this uni­verse are quan­tum fields!”

Thus the on­tol­ogy iden­ti­fi­ca­tion prob­lem is ar­guably one of the crit­i­cal sub­prob­lems of value al­ign­ment: it plau­si­bly has the prop­erty that, if botched, it could po­ten­tially crash the er­ror re­cov­ery mechanism.

Di­a­mond iden­ti­fi­ca­tion in multi-level maps

A re­al­is­tic, bounded di­a­mond max­i­mizer wouldn’t rep­re­sent the out­side uni­verse with atom­i­cally de­tailed or quan­tum-de­tailed mod­els. In­stead, a bounded agent would have some ver­sion of a multi-level map of the world in which the agent knew in prin­ci­ple that things were com­posed of atoms, but didn’t model most things in atomic de­tail. A bounded agent’s model of an air­plane would have wings, or wing shapes, rather than atom­i­cally de­tailed wings. It would think about wings when do­ing aero­dy­namic en­g­ineer­ing, atoms when do­ing chem­istry, nu­clear physics when do­ing nu­clear en­g­ineer­ing, and definitely not try to model ev­ery­thing in its ex­pe­rience down to the level of quan­tum fields.

At the pre­sent, there are not yet any pro­posed for­mal­isms for how to do prob­a­bil­ity the­ory with multi-level maps (in other words: no­body has yet put for­ward a guess at how to solve the prob­lem even given in­finite com­put­ing power). But it seems very likely that, if we did know what multi-level maps looked like for­mally, it might sug­gest a for­mal solu­tion to non-value-laden util­ity-re­bind­ing.

E.g., if an agent already has a sep­a­rate high-level con­cept of ‘di­a­mond’ that’s bound to a lower-level con­cept of ‘car­bon atoms bound to four other car­bon atoms’, then maybe when you dis­cover nu­clear physics, the multi-level map it­self would tend to sug­gest that ‘car­bon atoms’ be re-bound to ‘nu­clei with six pro­tons’ or ‘nu­clei with six pro­tons and six neu­trons’. It might at least be pos­si­ble to phrase the equiv­a­lent of a prior or mix­ture of weight­ings for how the util­ity func­tion would re-bind it­self, and say, “Given this prior, care about what­ever that sparkly hard stuff ‘di­a­mond’ ends up bind­ing to on the lower level.”

Un­for­tu­nately, we have very lit­tle for­mal prob­a­bil­ity the­ory to de­scribe how a multi-level map would go from ‘that un­known sparkly hard stuff’ to ‘car­bon atoms bound to four other car­bon atoms in tetra­he­dral pat­terns, which is the only known re­peat­ing pat­tern for car­bon atoms bound to four other car­bon atoms’ to ‘C12 and C14 are chem­i­cally iden­ti­cal but C14 is heav­ier’. This be­ing the case, we don’t know how to say any­thing about a dy­nam­i­cally up­dat­ing multi-level map in­side a prefer­ence frame­work.

If we were ac­tu­ally try­ing to build a di­a­mond max­i­mizer, we would be likely to en­counter this prob­lem long be­fore it started for­mu­lat­ing new physics. The equiv­a­lent of a com­pu­ta­tional dis­cov­ery that changes ‘the most effi­cient way to rep­re­sent di­a­monds’ is likely to hap­pen much ear­lier than a phys­i­cal dis­cov­ery that changes ‘what un­der­ly­ing phys­i­cal sys­tems prob­a­bly con­sti­tute a di­a­mond’.

This also means that we are li­able to face the on­tol­ogy iden­ti­fi­ca­tion prob­lem long be­fore the agent starts dis­cov­er­ing new physics, as soon as it starts re­vis­ing its rep­re­sen­ta­tion. Only very un­re­flec­tive agents with strongly fixed-in-place rep­re­sen­ta­tions for ev­ery part of the en­vi­ron­ment that we think the agent is sup­posed to care about, would let the on­tol­ogy iden­ti­fi­ca­tion prob­lem be elided en­tirely. Only very not-self-mod­ify­ing agents, or Carte­sian agents with goals for­mu­lated only over sense data, would not con­front their pro­gram­mers with on­tol­ogy iden­ti­fi­ca­tion prob­lems.

Re­search paths

More of these are de­scribed in the main ar­ti­cle on on­tol­ogy iden­ti­fi­ca­tion. But here’s a quick list of some rele­vant re­search sub­prob­lems and av­enues:

  • Trans­par­ent pri­ors. Pri­ors that are con­strained to mean­ingful hy­poth­e­sis spaces that the util­ity func­tion knows how to in­ter­pret. Rather than all Tur­ing ma­chines be­ing hy­pothe­ses, we could have only causal mod­els be­ing hy­pothe­ses, and then prefer­ence frame­works that talked about ‘the cause of’ la­beled sen­sory data could read the hy­pothe­ses. (Note that the space of causal mod­els can be Tur­ing-com­plete, in the sense of be­ing able to em­bed any Tur­ing ma­chine as a causal sys­tem. So we’d be able to ex­plain any com­putable sense data in terms of a causal model—we wouldn’t sac­ri­fice any ex­plana­tory power by re­strict­ing our­selves to ‘causal mod­els’ in­stead of ‘all Tur­ing ma­chines’.)

  • Re­duc­tion­ist iden­ti­fi­ca­tions. Be­ing able to go hunt­ing, in­side the cur­rent model of an en­vi­ron­ment, for a thingy that looks like it’s made out of type-1 thin­gies bound to four other type-1 thin­gies, where a type-1 thingy is it­self made out of six type-2, six type-3, and six type-4 thin­gies (6 elec­trons, 6 pro­tons, 6 neu­trons).

  • Causal iden­ti­fi­ca­tions. Some vari­a­tion on try­ing to iden­tify di­a­monds as the causes of pic­tures of di­a­monds, for some data set of things la­beled as di­a­monds or non-di­a­monds. This doesn’t work im­me­di­ately be­cause then it’s not clear whether “the cause” of the pic­ture is the pho­tons re­flect­ing off the di­a­mond, the di­a­mond it­self, the ge­olog­i­cal pres­sures that pro­duced the di­a­mond, the laws of physics, etcetera. But per­haps some cross­fire of iden­ti­fi­ca­tion could pin down the ‘di­a­mond’ cat­e­gory in­side a causal model, by ap­ply­ing some for­mal rule to sev­eral sets of the right sort of la­beled sense data. As an open prob­lem: If an agent has a rich causal model that in­cludes cat­e­gories like ‘di­a­mond’ some­where un­known, and you can point to la­beled sen­sory datasets and use ca­sual and cat­e­gor­i­cal lan­guage, what la­beled datasets and lan­guage would un­am­bigu­ously iden­tify di­a­monds, and no other white sparkly things, even if the re­sult­ing con­cept of ‘di­a­mond’ was be­ing sub­ject to max­i­miza­tion? (Note that un­der this ap­proach, as with any prefer­ence frame­work that talks about the causes of sen­sory ex­pe­riences, we need to worry about Chris­ti­ano’s Hack.)

  • Am­bi­guity re­s­olu­tion. De­tect when an on­tol­ogy iden­ti­fi­ca­tion is am­bigu­ous, and re­fer the prob­lem to the user/​pro­gram­mer. At our pre­sent stage of knowl­edge this seems like pretty much the same prob­lem as in­duc­tive am­bi­guity re­s­olu­tion.

  • Multi-level maps. Solve the prob­lem of bounded agents hav­ing maps of the world that op­er­ate at mul­ti­ple, in­ter­act­ing re­duc­tion­ist lev­els, as de­signed to save on com­put­ing power. Then solve on­tol­ogy iden­ti­fi­ca­tion by ini­tially bind­ing to a higher level of the map, and in­tro­duc­ing some rule for re-bind­ing as the map up­dates. Note that multi-level map­ping is an AGI rather than FAI prob­lem, mean­ing that work here should per­haps be clas­sified.

  • Solu­tion for non-self-mod­ify­ing Ge­nies. Try to state a ‘hack’ solu­tion to on­tol­ogy iden­ti­fi­ca­tion that would work for an AI run­ning on fixed al­gorithms where a per­sis­tent knowl­edge rep­re­sen­ta­tion is known at de­vel­op­ment time.

Some implications

The on­tol­ogy iden­ti­fi­ca­tion prob­lem is one more rea­son to be­lieve that hard-coded ob­ject-level util­ity func­tions should be avoided and that value iden­ti­fi­ca­tion in gen­eral is hard.

On­tol­ogy iden­ti­fi­ca­tion is heav­ily en­tan­gled with AGI prob­lems, mean­ing that some re­search on on­tol­ogy iden­ti­fi­ca­tion may need to be non-pub­lic. This is an ex­am­ple in­stance of the ar­gu­ment that at least some VAT re­search may need to be non-pub­lic, based on that at least some AGI re­search is bet­ter off non-pub­lic.