Extrapolated volition (normative moral theory)

(This page is about ex­trap­o­lated vo­li­tion as a nor­ma­tive moral the­ory—that is, the the­ory that ex­trap­o­lated vo­li­tion cap­tures the con­cept of value or what out­comes we should want. For the closely re­lated pro­posal about what a suffi­ciently ad­vanced self-di­rected AGI should be built to want/​tar­get/​de­cide/​do, see co­her­ent ex­trap­o­lated vo­li­tion.)


Ex­trap­o­lated vo­li­tion is the no­tion that when we ask “What is right?”, then in­so­far as we’re ask­ing some­thing mean­ingful, we’re ask­ing about the re­sult of run­ning a cer­tain log­i­cal func­tion over pos­si­ble states of the world, where this func­tion is an­a­lyt­i­cally iden­ti­cal to the re­sult of ex­trap­o­lat­ing our cur­rent de­ci­sion-mak­ing pro­cess in di­rec­tions such as “What if I knew more?”, “What if I had time to con­sider more ar­gu­ments (so long as the ar­gu­ments weren’t hack­ing my brain)?”, or “What if I un­der­stood my­self bet­ter and had more self-con­trol?”

A sim­ple ex­am­ple of ex­trap­o­lated vo­li­tion might be to con­sider some­body who asks you to bring them or­ange juice from the re­friger­a­tor. You open the re­friger­a­tor and see no or­ange juice, but there’s lemon­ade. You imag­ine that your friend would want you to bring them lemon­ade if they knew ev­ery­thing you knew about the re­friger­a­tor, so you bring them lemon­ade in­stead. On an ab­stract level, we can say that you “ex­trap­o­lated” your friend’s “vo­li­tion”, in other words, you took your model of their mind and de­ci­sion pro­cess, or your model of their “vo­li­tion”, and you imag­ined a coun­ter­fac­tual ver­sion of their mind that had bet­ter in­for­ma­tion about the con­tents of your re­friger­a­tor, thereby “ex­trap­o­lat­ing” this vo­li­tion.

Hav­ing bet­ter in­for­ma­tion isn’t the only way that a de­ci­sion pro­cess can be ex­trap­o­lated; we can also, for ex­am­ple, imag­ine that a mind has more time in which to con­sider moral ar­gu­ments, or bet­ter knowl­edge of it­self. Maybe you cur­rently want re­venge on the Ca­pulet fam­ily, but if some­body had a chance to sit down with you and have a long talk about how re­venge af­fects civ­i­liza­tions in the long run, you could be talked out of that. Maybe you’re cur­rently con­vinced that you ad­vo­cate for green shoes to be out­lawed out of the good­ness of your heart, but if you could ac­tu­ally see a print­out of all of your own emo­tions at work, you’d see there was a lot of bit­ter­ness di­rected at peo­ple who wear green shoes, and this would change your mind about your de­ci­sion.

In Yud­kowsky’s ver­sion of ex­trap­o­lated vo­li­tion con­sid­ered on an in­di­vi­d­ual level, the three core di­rec­tions of ex­trap­o­la­tion are:

  • In­creased knowl­edge—hav­ing more veridi­cal knowl­edge of declar­a­tive facts and ex­pected out­comes.

  • In­creased con­sid­er­a­tion of ar­gu­ments—be­ing able to con­sider more pos­si­ble ar­gu­ments and as­sess their val­idity.

  • In­creased re­flec­tivity—greater knowl­edge about the self, and to some de­gree, greater self-con­trol (though this raises fur­ther ques­tions about which parts of the self nor­ma­tively get to con­trol which other parts).


Differ­ent peo­ple re­act differ­ently to the ques­tion “Where should we point an au­tonomous su­per­in­tel­li­gence, if we can point it ex­actly?” and ap­proach it from differ­ent an­gles. and we’ll even­tu­ally need an Ar­bital dis­patch­ing ques­tion­naire on a page that han­dles it Th­ese an­gles in­clude:

  • All this talk of ‘should­ness’ is just a cover for the fact that who­ever gets to build the su­per­in­tel­li­gence wins all the mar­bles; no mat­ter what you do with your su­per­in­tel­li­gence, you’ll be the one who does it.

  • What if we tell the su­per­in­tel­li­gence what to do and it’s the wrong thing? What if we’re ba­si­cally con­fused about what’s right? Shouldn’t we let the su­per­in­tel­li­gence figure that out on its own with its own su­pe­rior in­tel­li­gence?

  • Imag­ine the An­cient Greeks tel­ling a su­per­in­tel­li­gence what to do. They’d have told it to op­ti­mize per­sonal virtues, in­clud­ing, say, a glo­ri­ous death in bat­tle. This seems like a bad thing and we need to figure out how not to do the analo­gous thing. So tel­ling an AGI to do what seems like a good idea to us will also end up seem­ing a very re­gret­table de­ci­sion a mil­lion years later.

  • Ob­vi­ously we should just tell the AGI to op­ti­mize liberal demo­cratic val­ues. Liberal demo­cratic val­ues are good. The real threat is if bad peo­ple get their hands on AGI and build an AGI that doesn’t op­ti­mize liberal demo­cratic val­ues.

Some cor­re­spond­ing ini­tial replies might be:

  • Okay, but sup­pose you’re a pro­gram­mer and you’re try­ing not to be a jerk. If you’re like, “Well, what­ever I do origi­nates in my­self and is there­fore equally self­ish, so I might as well de­clare my­self God-Em­peror of Earth,” you’re be­ing a jerk. Is there any­thing we can do which is less jerky, and in­deed, min­i­mally jerky?

  • If you say you have no in­for­ma­tion at all about what’s ‘right’, then what does the term even mean? If I might as well have my AGI max­i­mize pa­per­clips and you have no ground on which to stand and say that’s the wrong way to com­pute nor­ma­tivity, then what are we even talk­ing about in the first place? The word ‘right’ or ‘should’ must have some mean­ing that you know about, even if it doesn’t au­to­mat­i­cally print out a list of ev­ery­thing you know is right. Let’s talk about hunt­ing down that mean­ing.

  • Okay, so what should the An­cient Greeks have done if they did have to pro­gram an AI? How could they not have doomed fu­ture gen­er­a­tions? Sup­pose the An­cient Greeks are clever enough to have no­ticed that some­times peo­ple change their minds about things and to re­al­ize that they might not be right about ev­ery­thing. How can they use the clev­er­ness of the AGI in a con­struc­tively speci­fied, com­putable fash­ion that gets them out of this hole? You can’t just tell the AGI to com­pute what’s ‘right’, you need to put an ac­tual com­putable ques­tion in there, not a word.

  • What if you would, af­ter some fur­ther dis­cus­sion, want to tweak your defi­ni­tion of “liberal demo­cratic val­ues” just a lit­tle? What if it’s pre­dictable that you would do that? Would you re­ally want to be stuck with your off-the-cuff defi­ni­tion a mil­lion years later?

Ar­guendo by CEV’s ad­vo­cates, these con­ver­sa­tions even­tu­ally all end up con­verg­ing on Co­her­ent Ex­trap­o­lated Vo­li­tion as an al­ign­ment pro­posal by differ­ent roads.

“Ex­trap­o­lated vo­li­tion” is the cor­re­spond­ing nor­ma­tive the­ory that you ar­rive at by ques­tion­ing the mean­ing of ‘right’ or try­ing to figure out what we ‘should’ re­ally truly do.

EV as res­cu­ing the no­tion of betterness

We can see EV as try­ing to res­cue the fol­low­ing prethe­o­retic in­tu­itions (as they might be ex­pe­rienced by some­one feel­ing con­fused, or just some­body who’d never ques­tioned metaethics in the first place):

  • (a) It’s pos­si­ble to think that some­thing is right, and be in­cor­rect.

  • (a1) It’s pos­si­ble for some­thing to be wrong even if no­body knows that it’s wrong. E.g. an un­even di­vi­sion of an ap­ple pie might be un­fair even if all re­cip­i­ents don’t re­al­ize this.

  • (a2) We can learn more about what’s right, and change our minds to be righter.

  • (b) Tak­ing a pill that changes what you think is right, should not change what is right. (If you’re con­tem­plat­ing tak­ing a pill that makes you think it’s right to se­cretly mur­der 12-year-olds, you should not rea­son, “Well, if I take this pill I’ll mur­der 12-year-olds… but also it will be all right to mur­der 12-year-olds, so this is a great pill to take.”)

  • (c) We could be wrong, but it sure seems like the things on Frankena’s list are all rea­son­ably good. (“Life, con­scious­ness, and ac­tivity; health and strength; plea­sures and satis­fac­tions of all or cer­tain kinds; hap­piness, beat­i­tude, con­tent­ment, etc…”)

  • (c1) The fact that we could be in some mys­te­ri­ous way “wrong” about what be­longs on Frankena’s list, doesn’t seem to leave enough room for “make as many pa­per­clips as pos­si­ble” to be the only thing on the list. Even our state of con­fu­sion and pos­si­ble ig­no­rance doesn’t seem to al­low for that to be the an­swer. We’re at least pretty sure that isn’t the to­tal sum of good­ness.

  • (c2) Similarly, on the meta-level, it doesn’t seem like the meta-level pro­ce­dure “Pick what­ever pro­ce­dure for de­ter­min­ing right­ness, leads to the most pa­per­clips ex­ist­ing af­ter you adopt it” could be the cor­rect an­swer.

We can­not res­cue these prop­er­ties by say­ing:

“There is an ir­re­ducible, non-nat­u­ral ‘right­ness’ XML tag at­tached to some ob­jects and events. Our brains per­ceive this XML tag, but im­perfectly, giv­ing us prop­erty (a) when we think the XML tag is there, even though it isn’t. The XML tags are there even if no­body sees them (a1). Some­times we stare harder and see the XML tag bet­ter (a2). Ob­vi­ously, do­ing any­thing to a brain isn’t go­ing to change the XML tag (b), just fool the brain or in­val­i­date its map of the XML tag. All of the things on Frankena’s list have XML tags (c) or at least we think so. For pa­per­clips to be the to­tal cor­rect con­tent of Frankena’s list, we’d need to be wrong about pa­per­clips not hav­ing XML tags and wrong about ev­ery­thing on Frankena’s list that we think does have an XML tag (c1). And on the meta-level, “Which sense of right­ness leads to the most pa­per­clips?” doesn’t say any­thing about XML tags, and it doesn’t lead to there be­ing lots of XML tags, so there’s no jus­tifi­ca­tion for it (c2).”

This doesn’t work be­cause:

  • There are, in fact, no tiny ir­re­ducible XML tags at­tached to ob­jects.

  • If there were lit­tle tags like that, there’d be no ob­vi­ous nor­ma­tive jus­tifi­ca­tion for our car­ing about them.

  • It doesn’t seem like we should be able to make it good to mur­der 12-year-olds by swap­ping around the ir­re­ducible XML tags on the event.

  • There’s no way our brains could per­ceive these tiny XML tags even if they were there.

  • There’s no ob­vi­ous causal story for how hu­mans could have evolved such that we do in fact care about these tiny XML tags. (A de­scrip­tive rather than nor­ma­tive prob­lem with the the­ory as a whole; nat­u­ral se­lec­tion has no nor­ma­tive force or jus­tifi­ca­tional power, but we do need our the­ory of how brains ac­tu­ally work to be com­pat­i­ble with it).

Onto what sort of en­tity can we then map our in­tu­itions, if not onto tiny XML tags?

Con­sider the prop­erty of six­ness pos­sessed by six ap­ples on a table. The re­la­tion be­tween the phys­i­cal six ap­ples on a table, and the log­i­cal num­ber ‘6’, is given by a log­i­cal func­tion that takes phys­i­cal de­scrip­tions as in­puts: in par­tic­u­lar, the func­tion “count the num­ber of ap­ples on the table”.

Could we res­cue ‘right­ness’ onto a log­i­cal func­tion like this, only much more com­pli­cated?

Let’s ex­am­ine how the 6-ness prop­erty and the “count­ing ap­ples” func­tion be­have:

  • There are, in fact, no tiny tags say­ing ‘6’ at­tached to the ap­ples (and yet there are still six of them).

  • It’s pos­si­ble to think there are 6 ap­ples on the table, and be wrong.

  • We can some­times change our minds about how many ap­ples there are on a table.

  • There can be 6 ap­ples on a table even if no­body is look­ing at it.

  • Tak­ing a pill that changes how many ap­ples you think are on the table, doesn’t change the num­ber of ap­ples on the table.

  • You can’t have a 6-tag-ma­nipu­la­tor that changes the num­ber of ap­ples on a table with­out chang­ing any­thing about the table or ap­ples.

  • There’s a clear causal story for how we can see ap­ples, and also for how our brains can count things, and there’s an un­der­stand­able his­tor­i­cal fact about why hu­mans count things.

  • Chang­ing the his­tory of how hu­mans count things could change which log­i­cal func­tion our brains were com­put­ing on the table, so that our brains were no longer “count­ing ap­ples”, but it wouldn’t change the num­ber of ap­ples on the table. We’d be chang­ing which log­i­cal func­tion our brains were con­sid­er­ing, not chang­ing the log­i­cal facts them­selves or mak­ing it so that iden­ti­cal premises would lead to differ­ent con­clu­sions.

  • Sup­pose some­body says, “Hey, you know, some­times we’re wrong about whether there’s 6 of some­thing or not, maybe we’re just en­tirely con­fused about this count­ing thing; maybe the real num­ber of ap­ples on this table is this pa­per­clip I’m hold­ing.” Even if you of­ten made mis­takes in count­ing, didn’t know how to ax­io­m­a­tize ar­ith­metic, and were feel­ing con­fused about the na­ture of num­bers, you would still know enough about what you were talk­ing about to feel pretty sure that the num­ber of ap­ples on the table was not in fact a pa­per­clip.

  • If you could ask a su­per­in­tel­li­gence how many grains of sand your brain would think there were on a beach, in the limit of your brain rep­re­sent­ing ev­ery­thing the su­per­in­tel­li­gence knew and think­ing very quickly, you would in­deed gain veridi­cal knowl­edge about the num­ber of grains of sand on that beach. Your brain doesn’t de­ter­mine the num­ber of grains of sand on the beach, and you can’t change the log­i­cal prop­er­ties of first-or­der ar­ith­metic by tak­ing a pill that changes your brain. But there’s an an­a­lytic re­la­tion be­tween the pro­ce­dure your brain cur­rently rep­re­sents and tries to carry out in an er­ror-prone way, and the log­i­cal func­tion that counts how many grains of sand on the beach.

This sug­gests that 6-ness has the cor­rect on­tolog­i­cal na­ture for some much big­ger and more com­pli­cated log­i­cal func­tion than “Count the num­ber of ap­ples on the table” to be out­putting right­ness. Or rather, if we want to res­cue our prethe­o­retic sense of right­ness in a way that adds up to moral nor­mal­ity, we should res­cue it onto a log­i­cal func­tion.

This func­tion, e.g., starts with the items on Frankena’s list and ev­ery­thing we cur­rently value; but also takes into ac­count the set of ar­gu­ments that might change our mind about what goes on the list; and also takes into ac­count meta-level con­di­tions that we would en­dorse as dis­t­in­guish­ing “valid ar­gu­ments” and “ar­gu­ments that merely change our minds”. (This last point is prag­mat­i­cally im­por­tant if we’re con­sid­er­ing try­ing to get a su­per­in­tel­li­gence to ex­trap­o­late our vo­li­tions. The list of ev­ery­thing that does in fact change your mind might in­clude par­tic­u­lar pat­terns of ro­tat­ing spiral pixel pat­terns that effec­tively hack a hu­man brain.)

The end re­sult of all this work is that we go on guess­ing which acts are right and wrong as be­fore, go on con­sid­er­ing that some pos­si­ble valid ar­gu­ments might change our minds, go on weigh­ing such ar­gu­ments, and go on valu­ing the things on Frankena’s list in the mean­time. The the­ory as a whole is in­tended to add up to the same moral nor­mal­ity as be­fore, just with that nor­mal­ity em­bed­ded into the world of causal­ity and logic in a non-con­fus­ing way.

One point we could have taken into our start­ing list of im­por­tant prop­er­ties, but deferred un­til later:

  • It sure feels like there’s a beau­tiful, mys­te­ri­ous float­ing ‘right­ness’ prop­erty of things that are right, and that the things that have this prop­erty are ter­ribly pre­cious and im­por­tant.

On the gen­eral pro­gram of “res­cu­ing the util­ity func­tion”, we should not scorn this feel­ing, and should in­stead figure out how to map it onto what ac­tu­ally ex­ists.

In this case, hav­ing pre­served al­most all the struc­tural prop­er­ties of moral nor­mal­ity, there’s no rea­son why any­thing should change about how we ex­pe­rience the cor­re­spond­ing emo­tion in ev­ery­day life. If our na­tive emo­tions are hav­ing trou­ble with this new, weird, ab­stract, learned rep­re­sen­ta­tion of ‘a cer­tain big com­pli­cated log­i­cal func­tion’, we should do our best to re­mem­ber that the right­ness is still there. And this is not a re­treat to sec­ond-best any more than “di­s­or­dered ki­netic en­ergy” is some kind of sad con­so­la­tion prize for the uni­verse’s lack of on­tolog­i­cally ba­sic warmth, etcetera.

Un­res­cua­bil­ity of moral internalism

In stan­dard metaeth­i­cal terms, we have man­aged to res­cue ‘moral cog­ni­tivism’ (state­ments about right­ness have truth-val­ues) and ‘moral re­al­ism’ (there is a fact of the mat­ter out there about how right some­thing is). We have not how­ever man­aged to res­cue the prethe­o­retic in­tu­ition un­der­ly­ing ‘moral in­ter­nal­ism’:

  • A moral ar­gu­ment, to be valid, ought to be able to per­suade any­one. If a moral ar­gu­ment is un­per­sua­sive to some­one who isn’t mak­ing some kind of clear mis­take in re­ject­ing it, then that ar­gu­ment must rest on some ap­peal to a pri­vate or merely self­ish con­sid­er­a­tion that should form no part of true moral­ity that ev­ery­one can per­ceive.

This in­tu­ition can­not be pre­served in any rea­son­able way, be­cause pa­per­clip max­i­miz­ers are in fact go­ing to go on mak­ing pa­per­clips (and not be­cause they made some kind of cog­ni­tive er­ror). A pa­per­clip max­i­mizer isn’t dis­agree­ing with you about what’s right (the out­put of the log­i­cal func­tion), it’s just fol­low­ing what­ever plan leads to the most pa­per­clips.

Since the pa­per­clip max­i­mizer’s policy isn’t in­fluenced by any of our moral ar­gu­ments, we can’t pre­serve the in­ter­nal­ist in­tu­ition with­out re­duc­ing the set of valid jus­tifi­ca­tions and truly valuable things to the empty set—and even that, a pa­per­clip max­i­mizer wouldn’t find mo­ti­va­tion­ally per­sua­sive!

Thus our op­tions re­gard­ing the prethe­o­retic in­ter­nal­ist in­tu­ition that a moral ar­gu­ment is not valid if not uni­ver­sally per­sua­sive, seem to be limited to the fol­low­ing:

  1. Give up on the in­tu­ition in its in­tu­itive form: a pa­per­clip max­i­mizer doesn’t care if it’s un­just to kill ev­ery­one; and you can’t talk it into be­hav­ing differ­ently; and this doesn’t re­flect a cog­ni­tive stum­ble on the pa­per­clip max­i­mizer’s part; and this fact gives us no in­for­ma­tion about what is right or jus­tified.

  2. Pre­serve, at the cost of all other prethe­o­retic in­tu­itions about right­ness, the in­tu­ition that only ar­gu­ments that uni­ver­sally in­fluence be­hav­ior are valid: that is, there are no valid moral ar­gu­ments.

  3. Try to sweep the prob­lem un­der the rug by claiming that rea­son­able minds must agree that pa­per­clips are ob­jec­tively pointless… even though Clippy is not suffer­ing from any defect of epistemic or in­stru­men­tal power, and there’s no place in Clippy’s code where we can point to some in­her­ently per­sua­sive ar­gu­ment be­ing dropped by a defect or spe­cial case of that code.

It’s not clear what the point of stance (2) would be, since even this is not an ar­gu­ment that would cause Clippy to al­ter its be­hav­ior, and hence the stance is self-defeat­ing. (3) seems like a mere word game, and po­ten­tially a very dan­ger­ous word game if it tricks AI de­vel­op­ers into think­ing that right­ness is a de­fault be­hav­ior of AIs, or even a func­tion of low al­gorith­mic com­plex­ity, or that benefi­cial be­hav­ior au­to­mat­i­cally cor­re­lates with ‘rea­son­able’ judg­ments about less value-laden ques­tions. See “Orthog­o­nal­ity Th­e­sis” for the ex­treme prac­ti­cal im­por­tance of ac­knowl­edg­ing that moral in­ter­nal­ism is in prac­tice false.

Si­tu­at­ing EV in con­tem­po­rary metaethics

Me­taethics is the field of aca­demic philos­o­phy that deals with the ques­tion, not of “What is good?”, but “What sort of prop­erty is good­ness?” As ap­plied to is­sues in Ar­tifi­cial In­tel­li­gence, rather than ar­gu­ing over which par­tic­u­lar out­comes are bet­ter or worse, we are, from a stand­point of ex­e­cutable philos­o­phy, ask­ing how to com­pute what is good; and why the out­put of any pro­posed com­pu­ta­tion ought to be iden­ti­fied with the no­tion of should­ness.

EV replies that for each per­son at a sin­gle mo­ment in time, right or should is to be iden­ti­fied with a (sub­jec­tively un­cer­tain) log­i­cal con­stant that is fixed for that per­son at that par­tic­u­lar mo­ment in time, where this log­i­cal con­stant is to be iden­ti­fied with the re­sult of run­ning the ex­trap­o­la­tion pro­cess on that per­son. We can’t run the ex­trap­o­la­tion pro­cess so we can’t get perfect knowl­edge of this log­i­cal con­stant, and will be sub­jec­tively un­cer­tain about what is right.

To elimi­nate one im­por­tant am­bi­guity in how this might cash out, we re­gard this log­i­cal con­stant as be­ing an­a­lyt­i­cally iden­ti­fied with the ex­trap­o­la­tion of our brains, but not coun­ter­fac­tu­ally de­pen­dent on coun­ter­fac­tu­ally vary­ing forms of our brains. If you imag­ine be­ing ad­ministered a pill that makes you want to kill peo­ple, then you shouldn’t com­pute in your imag­i­na­tion that differ­ent things are right for this new self. In­stead, this new self now wants to do some­thing other than what is right. We can mean­ingfully say, “Even if I (a coun­ter­fac­tual ver­sion of me) wanted to kill peo­ple, that wouldn’t make it right” be­cause the coun­ter­fac­tual al­ter­a­tion of the self doesn’t change the log­i­cal ob­ject that you mean by say­ing ‘right’.

How­ever, there’s still an an­a­lytic re­la­tion be­tween this log­i­cal ob­ject and your ac­tual mind­state, which is in­deed is im­plied by the very mean­ing of dis­course about should­ness, which means that you can get veridi­cal in­for­ma­tion about this log­i­cal ob­ject by hav­ing a suffi­ciently in­tel­li­gent AI run an ap­prox­i­ma­tion of the ex­trap­o­la­tion pro­cess over a good model of your ac­tual mind. If a suffi­ciently in­tel­li­gent and trust­wor­thy AGI tells you that af­ter think­ing about it for a while you wouldn’t want to eat cows, you have gained veridi­cal in­for­ma­tion about whether it’s right to eat cows.

Within the stan­dard ter­minol­ogy of aca­demic metaethics, “ex­trap­o­lated vo­li­tion” as a nor­ma­tive the­ory is:

  • Cog­ni­tivist. Nor­ma­tive propo­si­tions can be true or false. You can be­lieve that some­thing is right and be mis­taken.

  • Nat­u­ral­ist. Nor­ma­tive propo­si­tions are not ir­re­ducible or based on non-nat­u­ral prop­er­ties of the world.

  • Ex­ter­nal­ist /​ not in­ter­nal­ist. It is not the case that all suffi­ciently pow­er­ful op­ti­miz­ers must act on what we con­sider to be moral propo­si­tions. A pa­per­clip­per does what is clippy, not what is right, and the fact that it’s try­ing to turn ev­ery­thing into pa­per­clips does not in­di­cate a dis­agree­ment with you about what is right any more than you dis­agree about what is clippy.

  • Re­duc­tion­ist. The whole point of this the­ory is that it’s the sort of thing you could po­ten­tially com­pute.

  • More syn­thetic re­duc­tion­ist than an­a­lytic re­duc­tion­ist. We don’t have a pri­ori knowl­edge of our start­ing mind­state and don’t have enough com­put­ing power to com­plete the ex­trap­o­la­tion pro­cess over it. There­fore, we can’t figure out ex­actly what our ex­trap­o­lated vo­li­tion would say just by pon­der­ing the mean­ing of the word ‘right’.

Clos­est an­tecedents in aca­demic metaethics are Rawls and Good­man’s re­flec­tive equil­ibrium, Harsanyi and Rail­ton’s ideal ad­vi­sor the­o­ries, and Frank Jack­son’s moral func­tion­al­ism.

Moore’s Open Question

Ar­gu­ment. If ex­trap­o­lated vo­li­tion is an­a­lyt­i­cally equiv­a­lent to good, then the ques­tion “Is it true that ex­trap­o­lated vo­li­tion is good?” is mean­ingless or triv­ial. How­ever, this ques­tion is not mean­ingless or triv­ial, and seems to have an open qual­ity about it. There­fore, ex­trap­o­lated vo­li­tion is not an­a­lyt­i­cally equiv­a­lent to good­ness.

Re­ply. Ex­trap­o­lated vo­li­tion is not sup­posed to be trans­par­ently iden­ti­cal to good­ness. The nor­ma­tive iden­tity be­tween ex­trap­o­lated vo­li­tion and good­ness is al­lowed to be some­thing that you would have to think for a while and con­sider many ar­gu­ments to per­ceive.

Na­tively, hu­man be­ings don’t start out with any kind of ex­plicit com­mit­ment to a par­tic­u­lar metaethics; our brains just com­pute a feel­ing of right­ness about cer­tain acts, and then some­times up­date and say that acts we pre­vi­ously thought were right are not-right.

When we go from that, to try­ing to draw a cor­re­spond­ing log­i­cal func­tion that we can see our brains as ap­prox­i­mat­ing, and up­dat­ing when we learn new things or con­sider new ar­gu­ments, we are car­ry­ing out a pro­ject of “res­cu­ing the util­ity func­tion”. We are rea­son­ing that we can best res­cue our na­tive state of con­fu­sion by see­ing our rea­son­ing about good­ness as hav­ing its refer­ent in cer­tain log­i­cal facts, which lets us go on say­ing that it is bet­ter ce­teris paribus for peo­ple to be happy than in se­vere pain, and that we can’t re­verse this or­der­ing by tak­ing a pill that al­ters our brain (we can only make our fu­ture self act on differ­ent log­i­cal ques­tions), etcetera. It’s not sur­pris­ing if this bit of philos­o­phy takes longer than five min­utes to rea­son through.


  • Rescuing the utility function

    If your util­ity func­tion val­ues ‘heat’, and then you dis­cover to your hor­ror that there’s no on­tolog­i­cally ba­sic heat, switch to valu­ing di­s­or­dered ki­netic en­ergy. Like­wise ‘free will’ or ‘peo­ple’.


  • Value

    The word ‘value’ in the phrase ‘value al­ign­ment’ is a meta­syn­tac­tic vari­able that in­di­cates the speaker’s fu­ture goals for in­tel­li­gent life.