Mindcrime: Introduction

The more pre­dic­tive ac­cu­racy we want from a model, the more de­tailed the model be­comes. A very rough model of an air­plane might only con­tain the ap­prox­i­mate shape, the power of the en­g­ines, and the mass of the air­plane. A model good enough for en­g­ineer­ing needs to be de­tailed enough to simu­late the flow of air over the wings, the cen­tripetal force on the fan blades, and more. As a model can pre­dict the air­plane in more and more fine de­tail and with bet­ter and bet­ter prob­a­bil­ity dis­tri­bu­tions, the com­pu­ta­tions car­ried out to make the model’s pre­dic­tions may start to look more and more like a de­tail simu­la­tion of the air­plane fly­ing.

Con­sider a ma­chine in­tel­li­gence build­ing, and test­ing, the best mod­els it can man­age of a hu­man be­ing’s be­hav­ior. If the model that pro­duces the best pre­dic­tions in­volves simu­la­tions with mod­er­ate de­grees of iso­mor­phism to hu­man cog­ni­tion, then the model, as it runs, may it­self be self-aware or con­scious or sapi­ent or what­ever other prop­erty stands in for be­ing an ob­ject of eth­i­cal con­cern. This doesn’t mean that the run­ning model of Fred is Fred, or even that the run­ning model of Fred is hu­man. The con­cern is that a suffi­ciently ad­vanced model of a per­son will be a per­son, even if they might not be the same per­son.

We might then worry that, for ex­am­ple, if Fred is un­happy, or might be un­happy, the agent will con­sider thou­sands or mil­lions of hy­pothe­ses about ver­sions of Fred. Hy­pothe­ses about suffer­ing ver­sions of Fred, when run, might them­selves be suffer­ing. As a similar con­cern, these hy­pothe­ses about Fred might then be dis­carded—cease to be run—if the agent sees new ev­i­dence and up­dates its model. Since pro­grams can be peo­ple, stop­ping and eras­ing a con­scious pro­gram is the crime of mur­der.

This sce­nario, which we might call ‘the prob­lem of sapi­ent mod­els’, is a sub­sce­nario of the gen­eral prob­lem of what Bostrom terms ‘mind­crime’. (Eliezer Yud­kowsky has sug­gested ‘mindgeno­cide’ as a term with fewer Or­wellian con­no­ta­tions.) More gen­er­ally, we might worry that there are agent sys­tems that do huge amounts of moral harm just in virtue of the way they com­pute, by con­tain­ing em­bed­ded con­scious suffer­ing and death.

Another sce­nario might be called ‘the prob­lem of sapi­ent sub­sys­tems’. It’s pos­si­ble that, for ex­am­ple, the most effi­cient pos­si­ble sys­tem for, e.g., al­lo­cat­ing mem­ory to sub­pro­cesses, is a mem­ory-al­lo­cat­ing-sub­agent that is re­flec­tive enough to be an in­de­pen­dently con­scious per­son. This is dis­t­in­guished from the prob­lem of cre­at­ing a sin­gle ma­chine in­tel­li­gence that is con­scious and suffer­ing, be­cause the con­scious agent might be hid­den at a lower level of a de­sign, and there might be a lot more of them than just one suffer­ing su­per­a­gent.

Both of these sce­nar­ios con­sti­tute moral harm done in­side the agent’s com­pu­ta­tions, ir­re­spec­tive of its ex­ter­nal be­hav­ior. We can’t con­clude that we’ve done no harm by build­ing a su­per­in­tel­li­gence, just in virtue of the fact that the su­per­in­tel­li­gence doesn’t out­wardly kill any­one. There could be trillions of peo­ple suffer­ing and dy­ing in­side the su­per­in­tel­li­gence. This sets mind­crime apart from al­most all other con­cerns within the Value al­ign­ment prob­lem, which usu­ally re­volve around ex­ter­nal be­hav­ior.

To avoid mindgeno­cide, it would be very handy to know ex­actly which com­pu­ta­tions are or are not con­scious, sapi­ent, or oth­er­wise ob­jects of eth­i­cal con­cern. Or, in­deed, to know that any par­tic­u­lar class of com­pu­ta­tions are not ob­jects of eth­i­cal con­cern.

Yud­kowsky calls a non­per­son pred­i­cate any com­putable test we could safely use to de­ter­mine that a com­pu­ta­tion is definitely not a per­son. This test only needs two pos­si­ble an­swers, “Not a per­son” and “Don’t know”. It’s fine if the test says “Don’t know” on some non­per­son com­pu­ta­tions, so long as the test says “Don’t know” on all peo­ple and never says “Not a per­son” when the com­pu­ta­tion is con­scious af­ter all. Since the test only definitely tells us about non­per­son­hood, rather than de­tect­ing per­son­hood in any pos­i­tive sense, we can call it a non­per­son pred­i­cate.

How­ever, the goal is not just to have any non­per­son pred­i­cate—the pred­i­cate that only says “known non­per­son” for the empty com­pu­ta­tion and no oth­ers meets this test. The goal is to have a non­per­son pred­i­cate that in­cludes pow­er­ful, use­ful com­pu­ta­tions. We want to be able to build an AI that is not a per­son, and let that AI build sub­pro­cesses that we know will not be peo­ple, and let that AI im­prove its mod­els of en­vi­ron­men­tal hu­mans us­ing hy­pothe­ses that we know are not peo­ple. This means the non­per­son pred­i­cate does need to pass some AI de­signs, cog­ni­tive sub­pro­cess de­signs, and hu­man mod­els that are good enough for what­ever it is we want the AI to do.

This seems like it might be very hard for sev­eral rea­sons:

  • There is un­usu­ally ex­treme philo­soph­i­cal dis­pute, and con­fu­sion, about ex­actly which pro­grams are and are not con­scious or oth­er­wise ob­jects of eth­i­cal value. (It might not be ex­ag­ger­at­ing to scream “no­body knows what the hell is go­ing on”.)

  • We can’t fully pass any class of pro­grams that’s Tur­ing-com­plete. We can’t say once and for all that it’s safe to model grav­i­ta­tional in­ter­ac­tions in a so­lar sys­tem, if enor­mous grav­i­ta­tional sys­tems could en­code com­put­ers that en­code peo­ple.

  • The Near­est un­blocked strat­egy prob­lem ap­plies to any at­tempt to for­bid an ad­vanced con­se­quen­tial­ist agent from us­ing the most effec­tive or ob­vi­ous ways of mod­el­ing hu­mans. The next best way of mod­el­ing hu­mans, out­side the blocked-off op­tions, is un­usu­ally likely to look like a weird loop­hole that turns out to en­code sapi­ence some way we didn’t imag­ine.

An al­ter­na­tive for pre­vent­ing mind­crime with­out a trust­wor­thy non­per­son pred­i­cate is to con­sider agent de­signs in­tended not to model hu­mans, or other minds, in great de­tail, since there may be some pivotal achieve­ments that can be ac­com­plished with­out a value-al­igned agent mod­el­ing hu­man minds in de­tail.


  • Mindcrime

    Might a ma­chine in­tel­li­gence con­tain vast num­bers of un­happy con­scious sub­pro­cesses?

    • AI alignment

      The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.