Executable philosophy

“Ex­e­cutable philos­o­phy” is Eliezer Yud­kowsky’s term for dis­course about sub­jects usu­ally con­sid­ered to be­long to the realm of philos­o­phy, meant to be ap­plied to prob­lems that arise in de­sign­ing or al­ign­ing ma­chine in­tel­li­gence.

Two mo­ti­va­tions of “ex­e­cutable philos­o­phy” are as fol­lows:

  1. We need a philo­soph­i­cal anal­y­sis to be “effec­tive” in Tur­ing’s sense: that is, the terms of the anal­y­sis must be use­ful in writ­ing pro­grams. We need ideas that we can com­pile and run; they must be “ex­e­cutable” like code is ex­e­cutable.

  2. We need to pro­duce ad­e­quate an­swers on a time scale of years or decades, not cen­turies. In the en­trepreneurial sense of “good ex­e­cu­tion”, we need a method­ol­ogy we can ex­e­cute on in a rea­son­able timeframe.

Some con­se­quences:

  • We take at face value some propo­si­tions that seem ex­tremely likely to be true in real life, like “The uni­verse is a math­e­mat­i­cally sim­ple low-level unified causal pro­cess with no non-nat­u­ral el­e­ments or at­tach­ments”. This is al­most cer­tainly true, so as a mat­ter of fast en­trepreneurial ex­e­cu­tion, we take it as set­tled and move on rather than de­bat­ing it fur­ther.

  • This doesn’t mean we know how things are made of quarks, or that we in­stantly seize on the first the­ory pro­posed that in­volves quarks. Be­ing re­duc­tion­ist isn’t the same as cheer­ing for ev­ery­thing with a re­duc­tion­ist la­bel on it; even if one par­tic­u­lar nat­u­ral­is­tic the­ory is true, most pos­si­ble nat­u­ral­is­tic the­o­ries will still be wrong.

  • When­ever we run into an is­sue that seems con­fus­ing, we ask “What cog­ni­tive pro­cess is ex­e­cut­ing in­side our minds that feels from the in­side like this con­fu­sion?”

  • Rather than ask­ing “Is free will com­pat­i­ble with de­ter­minism?” we ask “What al­gorithm is run­ning in our minds that feels from the in­side like free will?”

    • If we start out in a state of con­fu­sion or ig­no­rance, then there might or not be such a thing as free will, and there might or might not be a co­her­ent con­cept to de­scribe the thing that does or doesn’t ex­ist, but we are definitely and in re­al­ity ex­e­cut­ing some dis­cov­er­able way of think­ing that cor­re­sponds to this feel­ing of con­fu­sion. By ask­ing the ques­tion on these grounds, we guaran­tee that it is an­swer­able even­tu­ally.

  • This pro­cess ter­mi­nates when the is­sue no longer feels con­fus­ing, not when a po­si­tion sounds very per­sua­sive.

  • “Con­fu­sion ex­ists in the map, not in the ter­ri­tory; if I don’t know whether a coin has landed heads or tails, that is a fact about my state of mind, not a fact about the coin. There can be mys­te­ri­ous ques­tions but not mys­te­ri­ous an­swers.”

  • We do not ac­cept as satis­fac­tory an ar­gu­ment that, e.g., hu­mans would have evolved to feel a sense of free will be­cause this was so­cially use­ful. This still takes a “sense of free will” as an unre­duced black box, and ar­gues about some prior cause of this feel­ing. We want to know which cog­ni­tive al­gorithm is ex­e­cut­ing that feels from the in­side like this sense. We want to learn the in­ter­nals of the black box, not cheer on an ar­gu­ment that some re­duc­tion­ist pro­cess caused the black box to be there.

  • Rather than ask­ing “What is good­ness made out of?”, we be­gin from the ques­tion “What al­gorithm would com­pute good­ness?”

  • We ap­ply a pro­gram­mer’s dis­ci­pline to make sure that all the con­cepts used in de­scribing this al­gorithm will also com­pile. You can’t say that ‘good­ness’ de­pends on what is ‘bet­ter’ un­less you can com­pute ‘bet­ter’.

Con­versely, we can’t just plug the prod­ucts of stan­dard an­a­lytic philos­o­phy into AI prob­lems, be­cause:

• The aca­demic in­cen­tives fa­vor con­tin­u­ing to dis­pute small pos­si­bil­ities be­cause “on­go­ing dis­pute” means “ev­ery­one keeps get­ting pub­li­ca­tions”. As some­body once put it, for aca­demic philos­o­phy, an un­solv­able prob­lem is “like a bis­cuit bag that never runs out of bis­cuits”. As a sheerly cul­tural mat­ter, this means that aca­demic philos­o­phy hasn’t ac­cepted that e.g. ev­ery­thing is made out of quarks (par­ti­cle fields) with­out any non-nat­u­ral or ir­re­ducible prop­er­ties at­tached.

In turn, this means that when aca­demic philoso­phers have tried to do metaethics, the re­sult has been a pro­lifer­a­tion of differ­ent the­o­ries that are mostly about non-nat­u­ral or ir­re­ducible prop­er­ties, with only a few philoso­phers tak­ing a stand on try­ing to do metaethics for a strictly nat­u­ral and re­ducible uni­verse. Those nat­u­ral­is­tic philoso­phers are still hav­ing to ar­gue for a nat­u­ral uni­verse rather than be­ing able to ac­cept this and move on to do fur­ther anal­y­sis in­side the nat­u­ral­is­tic pos­si­bil­ities. To build and al­ign Ar­tifi­cial In­tel­li­gence, we need to an­swer some com­plex ques­tions about how to com­pute good­ness; the field of aca­demic philos­o­phy is stuck on an ar­gu­ment about whether good­ness ought ever to be com­puted.

• Many aca­demic philoso­phers haven’t learned the pro­gram­mers’ dis­ci­pline of dis­t­in­guish­ing con­cepts that might com­pile. If we imag­ine rewind­ing the state of un­der­stand­ing of com­puter chess to what ob­tained in the days when Edgar Allen Poe proved that no mere au­toma­ton could play chess, then the mod­ern style of philos­o­phy would pro­duce, among other pa­pers, a lot of pa­pers con­sid­er­ing the ‘good­ness’ of a chess move as a prim­i­tive prop­erty and ar­gu­ing about the re­la­tion of good­ness to re­ducible prop­er­ties like con­trol­ling the cen­ter of a chess­board.

There’s a par­tic­u­lar mind­set that pro­gram­mers have for re­al­iz­ing which of their own thoughts are go­ing to com­pile and run, and which of their thoughts are not get­ting any closer to com­piling. A good pro­gram­mer knows, e.g., that if they offer a 20-page pa­per an­a­lyz­ing the ‘good­ness’ of a chess move in terms of which chess moves are ‘bet­ter’ than other chess moves, they haven’t ac­tu­ally come any closer to writ­ing a pro­gram that plays chess. (This prin­ci­ple is not to be con­fused with greedy re­duc­tion­ism, wherein you find one thing you un­der­stand how to com­pute a bit bet­ter, like ‘cen­ter con­trol’, and then take this to be the en­tirety of ‘good­ness’ in chess. Avoid­ing greedy re­duc­tion­ism is part of the skill that pro­gram­mers ac­quire of think­ing in effec­tive con­cepts.)

Many aca­demic philoso­phers don’t have this mind­set of ‘effec­tive con­cepts’, nor have they taken as a goal that the terms in their the­o­ries need to com­pile, nor do they know how to check whether a the­ory com­piles. This, again, is one of the foun­da­tional rea­sons why de­spite there be­ing a very large ed­ifice of aca­demic philos­o­phy, the prod­ucts of that philos­o­phy tend to be un­use­ful in AGI.

In more de­tail, Yud­kowsky lists these as some tenets or prac­tices of what he sees as ‘ex­e­cutable’ philos­o­phy:

  • It is ac­cept­able to take re­duc­tion­ism, and com­putabil­ity of hu­man thought, as a premise, and move on.

  • The pre­sump­tion here is that the low-level math­e­mat­i­cal unity of physics—the re­ducibil­ity of com­plex phys­i­cal ob­jects into small, math­e­mat­i­cally uniform phys­i­cal parts, etc­tera—has been bet­ter es­tab­lished than any philo­soph­i­cal ar­gu­ment which pur­ports to con­tra­dict them. Thus our ques­tion is “How can we re­duce this?” or “Which re­duc­tion is cor­rect?” rather than “Should this be re­duced?”

  • Yud­kowsky fur­ther sug­gests that things be re­duced to a mix­ture of causal facts and log­i­cal facts.

  • Most “philo­soph­i­cal is­sues” worth pur­su­ing can and should be rephrased as sub­ques­tions of some pri­mary ques­tion about how to de­sign an Ar­tifi­cial In­tel­li­gence, even as a mat­ter of philos­o­phy qua philos­o­phy.

  • E.g. rather than the cen­tral ques­tion be­ing “What is good­ness made out of?”, we be­gin with the cen­tral ques­tion “How do we de­sign an AGI that com­putes good­ness?” This doesn’t solve the ques­tion—to claim that would be greedy re­duc­tion­ism in­deed—but it does situ­ate the ques­tion in a prag­matic con­text.

  • This im­ports the dis­ci­pline of pro­gram­ming into philos­o­phy. In par­tic­u­lar, pro­gram­mers learn that even if they have an in­choate sense of what a com­puter should do, when they ac­tu­ally try to write it out as code, they some­times find that the code they have writ­ten fails (on vi­sual in­spec­tion) to match up with their in­choate sense. Many ideas that sound sen­si­ble as English sen­tences are re­vealed as con­fused as soon as we try to write them out as code.

  • Faced with any philo­soph­i­cally con­fus­ing is­sue, our task is to iden­tify what cog­ni­tive al­gorithm hu­mans are ex­e­cut­ing which feels from the in­side like this sort of con­fu­sion, rather than, as in con­ven­tional philos­o­phy, to try to clearly define terms and then weigh up all pos­si­ble ar­gu­ments for all ‘po­si­tions’.

  • This means that our cen­tral ques­tion is guaran­teed to have an an­swer.

  • E.g., if the stan­dard philo­soph­i­cal ques­tion is “Are free will and de­ter­minism com­pat­i­ble?” then there is not guaran­teed to be any co­her­ent thing we mean by free will, but it is guaran­teed that there is in fact some al­gorithm run­ning in our brain that, when faced with this par­tic­u­lar ques­tion, gen­er­ates a con­fus­ing sense of a hard-to-pin-down con­flict.

  • This is not to be con­fused with merely ar­gu­ing that, e.g., “Peo­ple evolved to feel like they had free will be­cause that was use­ful in so­cial situ­a­tions in the an­ces­tral en­vi­ron­ment.” That merely says, “I think evolu­tion is the cause of our feel­ing that we have free will.” It still treats the feel­ing it­self as a black box. It doesn’t say what al­gorithm is ac­tu­ally run­ning, or walk through that al­gorithm to see ex­actly how the sense of con­fu­sion arises. We want to know the in­ter­nals of the feel­ing of free will, not ar­gue that this black-box feel­ing has a re­duc­tion­ist-sound­ing cause.

A fi­nal trope of ex­e­cutable philos­o­phy is to not be in­timi­dated by how long a prob­lem has been left open. “Ig­no­rance ex­ists in the mind, not in re­al­ity; un­cer­tainty is in the map, not in the ter­ri­tory; if I don’t know whether a coin landed heads or tails, that’s a fact about me, not a fact about the coin.” There can’t be any un­re­solv­able con­fu­sions out there in re­al­ity. There can’t be any in­her­ently con­fus­ing sub­stances in the math­e­mat­i­cally lawful, unified, low-level phys­i­cal pro­cess we call the uni­verse. Any seem­ingly un­re­solv­able or im­pos­si­ble ques­tion must rep­re­sent a place where we are con­fused, not an ac­tu­ally im­pos­si­ble ques­tion out there in re­al­ity. This doesn’t mean we can quickly or im­me­di­ately solve the prob­lem, but it does mean that there’s some way to wake up from the con­fus­ing dream. Thus, as a mat­ter of en­trepreneurial ex­e­cu­tion, we’re al­lowed to try to solve the prob­lem rather than run away from it; try­ing to make an in­vest­ment here may still be prof­itable.

Although all con­fus­ing ques­tions must be places where our own cog­ni­tive al­gorithms are run­ning skew to re­al­ity, this, again, doesn’t mean that we can im­me­di­ately see and cor­rect the skew; nor that it is com­pilable philos­o­phy to in­sist in a very loud voice that a prob­lem is solv­able; nor that when a solu­tion is pre­sented we should im­me­di­ately seize on it be­cause the prob­lem must be solv­able and be­hold here is a solu­tion. An im­por­tant step in the method is to check whether there is any lin­ger­ing sense of some­thing that didn’t get re­solved; whether we re­ally feel less con­fused; whether it seems like we could write out the code for an AI that would be con­fused in the same way we were; whether there is any sense of dis­satis­fac­tion; whether we have merely chopped off all the in­ter­est­ing parts of the prob­lem.

An ear­lier guide to some of the same ideas was the Re­duc­tion­ism Se­quence.

tu­to­rial: finish­able philos­o­phy ap­plied to ‘free will’. (don’t for­get to dis­t­in­guish plau­si­ble wrong ways to do it on each step. is there a good ex­am­ple be­sides free will that can serve as a home­work prob­lem? maybe some­thing ac­tu­ally un­re­solved like ‘Why does any­thing ex­ist → why do some things ex­ist more than oth­ers?’ with Teg­mark Level IV as a con­sid­ered, but not ac­cepted an­swer.)


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.