“Mind­crime” is Nick Bostrom’s sug­gested term for sce­nar­ios in which an AI’s cog­ni­tive pro­cesses are in­trin­si­cally do­ing moral harm, for ex­am­ple be­cause the AI con­tains trillions of suffer­ing con­scious be­ings in­side it.

Ways in which this might hap­pen:

  • Prob­lem of sapi­ent mod­els (of hu­mans): Oc­curs nat­u­rally if the best pre­dic­tive model for hu­mans in the en­vi­ron­ment in­volves mod­els that are de­tailed enough to be peo­ple them­selves.

  • Prob­lem of sapi­ent mod­els (of civ­i­liza­tions): Oc­curs nat­u­rally if the agent tries to simu­late, e.g., alien civ­i­liza­tions that might be simu­lat­ing it, in enough de­tail to in­clude con­scious simu­la­tions of the aliens.

  • Prob­lem of sapi­ent sub­sys­tems: Oc­curs nat­u­rally if the most effi­cient de­sign for some cog­ni­tive sub­sys­tems in­volves cre­at­ing sub­agents that are self-re­flec­tive, or have some other prop­erty lead­ing to con­scious­ness or per­son­hood.

  • Prob­lem of sapi­ent self-mod­els: If the AI is con­scious or pos­si­ble fu­ture ver­sions of the AI are con­scious, it might run and ter­mi­nate a large num­ber of con­scious-self mod­els in the course of con­sid­er­ing pos­si­ble self-mod­ifi­ca­tions.

Prob­lem of sapi­ent mod­els (of hu­mans):

An in­stru­men­tal pres­sure to pro­duce high-fidelity pre­dic­tions of hu­man be­ings (or to pre­dict de­ci­sion coun­ter­fac­tu­als about them, or to search for events that lead to par­tic­u­lar con­se­quences, etcetera) may lead the AI to run com­pu­ta­tions that are un­usu­ally likely to pos­sess per­son­hood.

An un­re­al­is­tic ex­am­ple of this would be Solomonoff in­duc­tion, where pre­dic­tions are made by means that in­clude run­ning many pos­si­ble simu­la­tions of the en­vi­ron­ment and see­ing which ones best cor­re­spond to re­al­ity. Among cur­rent ma­chine learn­ing al­gorithms, par­ti­cle filters and Monte Carlo al­gorithms similarly in­volve run­ning many pos­si­ble simu­lated ver­sions of a sys­tem.

It’s pos­si­ble that a suffi­ciently ad­vanced AI to have suc­cess­fully ar­rived at de­tailed mod­els of hu­man in­tel­li­gence, would usu­ally also be ad­vanced enough that it never tried to use a pre­dictable/​search­able model that en­gaged in brute-force simu­la­tions of those mod­els. (Con­sider, e.g., that there will usu­ally be many pos­si­ble set­tings of a vari­able in­side a model, and an effi­cient model might ma­nipu­late data rep­re­sent­ing a prob­a­bil­ity dis­tri­bu­tion over those set­tings, rather than ever con­sid­er­ing one ex­act, spe­cific hu­man in toto.)

This, how­ever, doesn’t make it cer­tain that no mind­crime will oc­cur. It may not take ex­act, faith­ful simu­la­tion of spe­cific hu­mans to cre­ate a con­scious model. An effi­cient model of a (spread of pos­si­bil­ities for a) hu­man may still con­tain enough com­pu­ta­tions that re­sem­ble a per­son enough to cre­ate con­scious­ness, or what­ever other prop­er­ties may be de­serv­ing of per­son­hood. Con­sider, in par­tic­u­lar, an agent try­ing to use

Just as it al­most cer­tainly isn’t nec­es­sary to go all the way down to the neu­ral level to cre­ate a sapi­ent be­ing, it may be that even with some parts of a mind con­sid­ered ab­stractly, the re­main­der would be com­puted in enough de­tail to im­ply con­scious­ness, sapi­ence, per­son­hood, etcetera.

The prob­lem of sapi­ent mod­els is not to be con­fused with Si­mu­la­tion Hy­poth­e­sis is­sues. An effi­cient model of a hu­man need not have sub­jec­tive ex­pe­rience in­dis­t­in­guish­able from that of the hu­man (al­though it will be a model of a per­son who doesn’t be­lieve them­selves to be a model). The prob­lem oc­curs if the model is a per­son, not if the model is the same per­son as its sub­ject, and the lat­ter pos­si­bil­ity plays no role in the im­pli­ca­tion of moral harm.

Be­sides prob­lems that are di­rectly or ob­vi­ously about mod­el­ing peo­ple, many other prac­ti­cal prob­lems and ques­tions can benefit from mod­el­ing other minds—e.g., read­ing the di­rec­tions on a toaster oven in or­der to dis­cern the in­tent of the mind that was try­ing to com­mu­ni­cate how to use a toaster. Thus, mind­crime might re­sult from a suffi­ciently pow­er­ful AI try­ing to solve very mun­dane prob­lems.

Prob­lem of sapi­ent mod­els (of civ­i­liza­tions)

A sep­a­rate route to mind­crime comes from an ad­vanced agent con­sid­er­ing, in suffi­cient de­tail, the pos­si­ble ori­gins and fu­tures of in­tel­li­gent life on other wor­lds. (Imag­ine that you were sud­denly told that this ver­sion of you was ac­tu­ally em­bed­ded in a su­per­in­tel­li­gence that was imag­in­ing how life might evolve on a place like Earth, and that your sub­pro­cess was not pro­duc­ing suffi­ciently valuable in­for­ma­tion and was about to be shut down. You would prob­a­bly be an­noyed! We should try not to an­noy other peo­ple in this way.)

Three pos­si­ble ori­gins of a con­ver­gent in­stru­men­tal pres­sure to con­sider in­tel­li­gent civ­i­liza­tions in great de­tail:

  • As­sign­ing suffi­cient prob­a­bil­ity to the ex­is­tence of non-ob­vi­ous ex­trater­res­trial in­tel­li­gences in Earth’s vicinity, per­haps due to con­sid­er­ing the Fermi Para­dox.

  • Nat­u­ral­is­tic in­duc­tion, com­bined with the AI con­sid­er­ing the hy­poth­e­sis that it is in a simu­lated en­vi­ron­ment.

  • Log­i­cal de­ci­sion the­o­ries and util­ity func­tions that care about the con­se­quences of the AI’s de­ci­sions via in­stances of the AI’s refer­ence class that could be em­bed­ded in­side alien simu­la­tions.

With re­spect to the lat­ter two pos­si­bil­ities, note that the AI does not need to be con­sid­er­ing pos­si­bil­ities in which the whole Earth as we know it is a simu­la­tion. The AI only needs to con­sider that, among the pos­si­ble ex­pla­na­tions of the AI’s cur­rent sense data and in­ter­nal data, there are sce­nar­ios in which the AI is em­bed­ded in some world other than the most ‘ob­vi­ous’ one im­plied by the sense data. See also Dis­tant su­per­in­tel­li­gences can co­erce the most prob­a­ble en­vi­ron­ment of your AI for a re­lated haz­ard of the AI con­sid­er­ing pos­si­bil­ities in which it is be­ing simu­lated.

(Eliezer Yud­kowsky has ad­vo­cated that we shouldn’t let any AI short of ex­treme lev­els of safety and ro­bust­ness as­surance con­sider dis­tant civ­i­liza­tions in lots of de­tail in any case, since this means our AI might em­bed (a model of) a hos­tile su­per­in­tel­li­gence.)

Prob­lem of sapi­ent sub­sys­tems:

It’s pos­si­ble that the most effi­cient sys­tem for, say, al­lo­cat­ing mem­ory on a lo­cal cluster, con­sti­tutes a com­plete re­flec­tive agent with a self-model. Or that some of the most effi­cient de­signs for sub­pro­cesses of an AI, in gen­eral, hap­pen to have what­ever prop­er­ties lead up to con­scious­ness or what­ever other prop­er­ties are im­por­tant to per­son­hood.

This might pos­si­bly con­sti­tute a rel­a­tively less se­vere moral catas­tro­phe, if the sub­sys­tems are sen­tient but lack a re­in­force­ment-based plea­sure/​pain ar­chi­tec­ture (since the lat­ter is not ob­vi­ously a prop­erty of the most effi­cient sub­agents). In this case, there might be large num­bers of con­scious be­ings em­bed­ded in­side the AI and oc­ca­sion­ally dy­ing as they are re­placed, but they would not be suffer­ing. It is nonethe­less the sort of sce­nario that many of us would pre­fer to avoid.

Prob­lem of sapi­ent self-mod­els:

The AI’s mod­els of it­self, or of other AIs it could pos­si­bly build, might hap­pen to be con­scious or have other prop­er­ties de­serv­ing of per­son­hood. This is worth con­sid­er­ing as a sep­a­rate pos­si­bil­ity from build­ing a con­scious or per­son­hood-de­serv­ing AI our­selves, when we didn’t mean to do so, be­cause of these two ad­di­tional prop­er­ties:

  • Even if the AI’s cur­rent de­sign is not con­scious or per­son­hood-de­serv­ing, the cur­rent AI might con­sider pos­si­ble fu­ture ver­sions or sub­agent de­signs that would be con­scious, and those con­sid­er­a­tions might them­selves be con­scious.

  • This means that even if the AI’s cur­rent ver­sion doesn’t seem like it has key per­son­hood prop­er­ties on its own—that we’ve suc­cess­fully cre­ated the AI it­self as a non­per­son—we still need to worry about other con­scious AIs be­ing em­bed­ded into it.

  • The AI might cre­ate, run, and ter­mi­nate very large num­bers of po­ten­tial self-mod­els.

  • Even if we con­sider tol­er­able the po­ten­tial moral harm of cre­at­ing one con­scious AI (e.g. the AI lacks all of the con­di­tions that a re­spon­si­ble par­ent would want to en­sure when cre­at­ing a new in­tel­li­gent species, but it’s just one sapi­ent be­ing so it’s okay to do that in or­der to save the world), we might not want to take on the moral harm of cre­at­ing trillions of evanes­cent, swiftly erased con­scious be­ings.


Try­ing to con­sider these is­sues is com­pli­cated by:

  • Philo­soph­i­cal un­cer­tainty about what prop­er­ties are con­sti­tu­tive of con­scious­ness and which com­puter pro­grams have them;

  • Mo­ral un­cer­tainty about what (ideal­ized ver­sions of) (any par­tic­u­lar per­son’s) moral­ity would con­sider to be the key prop­er­ties of per­son­hood;

  • Our pre­sent-day un­cer­tainty about what effi­cient mod­els in ad­vanced agents would look like.

It’d help if we knew the an­swers to these ques­tions, but the fact that we don’t know doesn’t mean we can thereby con­clude that any par­tic­u­lar model is not a per­son. (This would be some mix of ar­gu­men­tum ad ig­no­rantiem, and availa­bil­ity bias mak­ing us think that a sce­nario is un­likely when it is hard to vi­su­al­ize.) In the limit of in­finite com­put­ing power, the epistem­i­cally best mod­els of hu­mans would al­most cer­tainly in­volve simu­lat­ing many pos­si­ble ver­sions of them; su­per­in­tel­li­gences would have very large amounts of com­put­ing power and we don’t know at what point we come close enough to this limit­ing prop­erty to cross the thresh­old.

Scope of po­ten­tial disaster

The prospect of mind­crime is an es­pe­cially alarm­ing pos­si­bil­ity be­cause suffi­ciently ad­vanced agents, es­pe­cially if they are us­ing com­pu­ta­tion­ally effi­cient mod­els, might con­sider very large num­bers of hy­po­thet­i­cal pos­si­bil­ities that would count as peo­ple. There’s no limit that says that if there are seven billion peo­ple, an agent will run at most seven billion mod­els; the agent might be con­sid­er­ing many pos­si­bil­ities per in­di­vi­d­ual hu­man. This would not be an as­tro­nom­i­cal dis­aster since it would not (by hy­poth­e­sis) wipe out our pos­ter­ity and our in­ter­galac­tic fu­ture, but it could be a dis­aster or­ders of mag­ni­tude larger than the Holo­caust, the Mon­gol Con­quest, the Mid­dle Ages, or all hu­man tragedy to date.

Devel­op­ment-or­der issue

If we ask an AI to pre­dict what we would say if we had a thou­sand years to think about the prob­lem of defin­ing per­son­hood or think about which causal pro­cesses are ‘con­scious’, this seems un­usu­ally likely to cause the AI to com­mit mind­crime in the course of an­swer­ing the ques­tion. Even ask­ing the AI to think ab­stractly about the prob­lem of con­scious­ness, or pre­dict by ab­stract rea­son­ing what hu­mans might say about it, seems un­usu­ally likely to re­sult in mind­crime. There thus ex­ists a de­vel­op­ment or­der is­sue pre­vent­ing us from ask­ing a Friendly AI to solve the prob­lem for us, since to file this re­quest safely and with­out com­mit­ting mind­crime, we would need the re­quest to already have been com­pleted.

The prospect of enor­mous-scale dis­aster miti­gates against ‘tem­porar­ily’ tol­er­at­ing mind­crime in­side a sys­tem, while, e.g., an ex­trap­o­lated-vo­li­tion or ap­proval-based agent tries to com­pute the code or de­sign of a non-mind­crim­i­nal agent. Depend­ing on the agent’s effi­ciency, and sec­on­dar­ily on its com­pu­ta­tional limits, a tremen­dous amount of moral harm might be done dur­ing the ‘tem­po­rary’ pro­cess of com­put­ing an an­swer.


Liter­ally no­body out­side of MIRI or FHI ever talks about this prob­lem.

Non­per­son predicates

A non­per­son pred­i­cate is an effec­tive test that we, or an AI, can use to de­ter­mine that some com­puter pro­gram is definitely not a per­son. In prin­ci­ple, a non­per­son pred­i­cate needs only two pos­si­ble out­puts, “Don’t know” and “Definitely not a per­son”. It’s ac­cept­able for many ac­tu­ally-non­per­son pro­grams to be la­beled “don’t know”, so long as no peo­ple are la­beled “definitely not a per­son”.

If the above was the only re­quire­ment, one sim­ple non­per­son pred­i­cate would be to la­bel ev­ery­thing “don’t know”. The im­plicit difficulty is that the non­per­son pred­i­cate must also pass some pro­grams of high com­plex­ity that do things like “ac­cept­ably model hu­mans” or “ac­cept­ably model fu­ture ver­sions of the AI”.

Be­sides ad­dress­ing mind­crime sce­nar­ios, Yud­kowsky’s origi­nal pro­posal was also aimed at know­ing that the AI de­sign it­self was not con­scious, or not a per­son.

It seems likely to be very hard to find a good non­per­son pred­i­cate:

  • Not all philo­soph­i­cal con­fu­sions and com­pu­ta­tional difficul­ties are averted by ask­ing for a par­tial list of un­con­scious pro­grams in­stead of a to­tal list of con­scious pro­grams. Even if we don’t know which prop­er­ties are suffi­cient, we’d need to know some­thing solid about prop­er­ties that are nec­es­sary for con­scious­ness or suffi­cient for non­per­son­hood.

  • We can’t pass once-and-for-all any class of pro­grams that’s Tur­ing-com­plete. We can’t say once and for all that it’s safe to model grav­i­ta­tional in­ter­ac­tions in a so­lar sys­tem, if enor­mous grav­i­ta­tional sys­tems could en­code com­put­ers that en­code peo­ple.

  • The Near­est un­blocked strat­egy prob­lem seems par­tic­u­larly wor­ri­some here. If we block off some op­tions for mod­el­ing hu­mans di­rectly, the next best op­tion is un­usu­ally likely to be con­scious. Even if we rely on a whitelist rather than a black­list, this may lead to a whitelisted “grav­i­ta­tional model” that se­cretly en­codes a hu­man, and so on.

Re­search avenues

  • Be­hav­iorism: Try to cre­ate a limited AI that does not model other minds or pos­si­bly even it­self, ex­cept us­ing some nar­row class of agent mod­els that we are pretty sure will not be sen­tient. This av­enue is po­ten­tially mo­ti­vated for other rea­sons as well, such as avoid­ing prob­a­ble en­vi­ron­ment hack­ing and avert­ing pro­gram­mer ma­nipu­la­tion.

  • Try to define a non­per­son pred­i­cate that whitelists enough pro­grams to carry out some pivotal achieve­ment.

  • Try for an AI that can boot­strap our un­der­stand­ing of con­scious­ness and tell us about what we would define as a per­son, while com­mit­ting a rel­a­tively small amount of mind­crime, with all com­puted pos­si­ble-peo­ple be­ing stored rather than dis­carded, and the mod­eled agents be­ing en­tirely happy, mostly happy, or non-suffer­ing. E.g., put a happy per­son at the cen­ter of the ap­proval-di­rected agent, and try to over­see the AI’s al­gorithms and ask it not to use Monte Carlo simu­la­tions if pos­si­ble.

  • Ig­nore the prob­lem in all pre-in­ter­stel­lar stages be­cause it’s still rel­a­tively small com­pared to as­tro­nom­i­cal stakes and there­fore not worth sig­nifi­cant losses in suc­cess prob­a­bil­ity. (This may back­fire un­der some ver­sions of the Si­mu­la­tion Hy­poth­e­sis.)

  • Try to finish the philo­soph­i­cal prob­lem of un­der­stand­ing which causal pro­cesses ex­pe­rience sapi­ence (or are oth­er­wise ob­jects of eth­i­cal value), in the next cou­ple of decades, to suffi­cient de­tail that it can be crisply stated to an AI, with suffi­ciently com­plete cov­er­age that it’s not sub­ject to the Near­est un­blocked strat­egy prob­lem.



  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.