Executable philosophy

“Executable philosophy” is Eliezer Yudkowsky’s term for discourse about subjects usually considered to belong to the realm of philosophy, meant to be applied to problems that arise in designing or aligning machine intelligence.

Two motivations of “executable philosophy” are as follows:

  1. We need a philosophical analysis to be “effective” in Turing’s sense: that is, the terms of the analysis must be useful in writing programs. We need ideas that we can compile and run; they must be “executable” like code is executable.

  2. We need to produce adequate answers on a time scale of years or decades, not centuries. In the entrepreneurial sense of “good execution”, we need a methodology we can execute on in a reasonable timeframe.

Some consequences:

  • We take at face value some propositions that seem extremely likely to be true in real life, like “The universe is a mathematically simple low-level unified causal process with no non-natural elements or attachments”. This is almost certainly true, so as a matter of fast entrepreneurial execution, we take it as settled and move on rather than debating it further.

  • This doesn’t mean we know how things are made of quarks, or that we instantly seize on the first theory proposed that involves quarks. Being reductionist isn’t the same as cheering for everything with a reductionist label on it; even if one particular naturalistic theory is true, most possible naturalistic theories will still be wrong.

  • Whenever we run into an issue that seems confusing, we ask “What cognitive process is executing inside our minds that feels from the inside like this confusion?”

  • Rather than asking “Is free will compatible with determinism?” we ask “What algorithm is running in our minds that feels from the inside like free will?”

    • If we start out in a state of confusion or ignorance, then there might or not be such a thing as free will, and there might or might not be a coherent concept to describe the thing that does or doesn’t exist, but we are definitely and in reality executing some discoverable way of thinking that corresponds to this feeling of confusion. By asking the question on these grounds, we guarantee that it is answerable eventually.

  • This process terminates when the issue no longer feels confusing, not when a position sounds very persuasive.

  • “Confusion exists in the map, not in the territory; if I don’t know whether a coin has landed heads or tails, that is a fact about my state of mind, not a fact about the coin. There can be mysterious questions but not mysterious answers.”

  • We do not accept as satisfactory an argument that, e.g., humans would have evolved to feel a sense of free will because this was socially useful. This still takes a “sense of free will” as an unreduced black box, and argues about some prior cause of this feeling. We want to know which cognitive algorithm is executing that feels from the inside like this sense. We want to learn the internals of the black box, not cheer on an argument that some reductionist process caused the black box to be there.

  • Rather than asking “What is goodness made out of?”, we begin from the question “What algorithm would compute goodness?”

  • We apply a programmer’s discipline to make sure that all the concepts used in describing this algorithm will also compile. You can’t say that ‘goodness’ depends on what is ‘better’ unless you can compute ‘better’.

Conversely, we can’t just plug the products of standard analytic philosophy into AI problems, because:

• The academic incentives favor continuing to dispute small possibilities because “ongoing dispute” means “everyone keeps getting publications”. As somebody once put it, for academic philosophy, an unsolvable problem is “like a biscuit bag that never runs out of biscuits”. As a sheerly cultural matter, this means that academic philosophy hasn’t accepted that e.g. everything is made out of quarks (particle fields) without any non-natural or irreducible properties attached.

In turn, this means that when academic philosophers have tried to do metaethics, the result has been a proliferation of different theories that are mostly about non-natural or irreducible properties, with only a few philosophers taking a stand on trying to do metaethics for a strictly natural and reducible universe. Those naturalistic philosophers are still having to argue for a natural universe rather than being able to accept this and move on to do further analysis inside the naturalistic possibilities. To build and align Artificial Intelligence, we need to answer some complex questions about how to compute goodness; the field of academic philosophy is stuck on an argument about whether goodness ought ever to be computed.

• Many academic philosophers haven’t learned the programmers’ discipline of distinguishing concepts that might compile. If we imagine rewinding the state of understanding of computer chess to what obtained in the days when Edgar Allen Poe proved that no mere automaton could play chess, then the modern style of philosophy would produce, among other papers, a lot of papers considering the ‘goodness’ of a chess move as a primitive property and arguing about the relation of goodness to reducible properties like controlling the center of a chessboard.

There’s a particular mindset that programmers have for realizing which of their own thoughts are going to compile and run, and which of their thoughts are not getting any closer to compiling. A good programmer knows, e.g., that if they offer a 20-page paper analyzing the ‘goodness’ of a chess move in terms of which chess moves are ‘better’ than other chess moves, they haven’t actually come any closer to writing a program that plays chess. (This principle is not to be confused with greedy reductionism, wherein you find one thing you understand how to compute a bit better, like ‘center control’, and then take this to be the entirety of ‘goodness’ in chess. Avoiding greedy reductionism is part of the skill that programmers acquire of thinking in effective concepts.)

Many academic philosophers don’t have this mindset of ‘effective concepts’, nor have they taken as a goal that the terms in their theories need to compile, nor do they know how to check whether a theory compiles. This, again, is one of the foundational reasons why despite there being a very large edifice of academic philosophy, the products of that philosophy tend to be unuseful in AGI.

In more detail, Yudkowsky lists these as some tenets or practices of what he sees as ‘executable’ philosophy:

  • It is acceptable to take reductionism, and computability of human thought, as a premise, and move on.

  • The presumption here is that the low-level mathematical unity of physics—the reducibility of complex physical objects into small, mathematically uniform physical parts, etctera—has been better established than any philosophical argument which purports to contradict them. Thus our question is “How can we reduce this?” or “Which reduction is correct?” rather than “Should this be reduced?”

  • Yudkowsky further suggests that things be reduced to a mixture of causal facts and logical facts.

  • Most “philosophical issues” worth pursuing can and should be rephrased as subquestions of some primary question about how to design an Artificial Intelligence, even as a matter of philosophy qua philosophy.

  • E.g. rather than the central question being “What is goodness made out of?”, we begin with the central question “How do we design an AGI that computes goodness?” This doesn’t solve the question—to claim that would be greedy reductionism indeed—but it does situate the question in a pragmatic context.

  • This imports the discipline of programming into philosophy. In particular, programmers learn that even if they have an inchoate sense of what a computer should do, when they actually try to write it out as code, they sometimes find that the code they have written fails (on visual inspection) to match up with their inchoate sense. Many ideas that sound sensible as English sentences are revealed as confused as soon as we try to write them out as code.

  • Faced with any philosophically confusing issue, our task is to identify what cognitive algorithm humans are executing which feels from the inside like this sort of confusion, rather than, as in conventional philosophy, to try to clearly define terms and then weigh up all possible arguments for all ‘positions’.

  • This means that our central question is guaranteed to have an answer.

  • E.g., if the standard philosophical question is “Are free will and determinism compatible?” then there is not guaranteed to be any coherent thing we mean by free will, but it is guaranteed that there is in fact some algorithm running in our brain that, when faced with this particular question, generates a confusing sense of a hard-to-pin-down conflict.

  • This is not to be confused with merely arguing that, e.g., “People evolved to feel like they had free will because that was useful in social situations in the ancestral environment.” That merely says, “I think evolution is the cause of our feeling that we have free will.” It still treats the feeling itself as a black box. It doesn’t say what algorithm is actually running, or walk through that algorithm to see exactly how the sense of confusion arises. We want to know the internals of the feeling of free will, not argue that this black-box feeling has a reductionist-sounding cause.

A final trope of executable philosophy is to not be intimidated by how long a problem has been left open. “Ignorance exists in the mind, not in reality; uncertainty is in the map, not in the territory; if I don’t know whether a coin landed heads or tails, that’s a fact about me, not a fact about the coin.” There can’t be any unresolvable confusions out there in reality. There can’t be any inherently confusing substances in the mathematically lawful, unified, low-level physical process we call the universe. Any seemingly unresolvable or impossible question must represent a place where we are confused, not an actually impossible question out there in reality. This doesn’t mean we can quickly or immediately solve the problem, but it does mean that there’s some way to wake up from the confusing dream. Thus, as a matter of entrepreneurial execution, we’re allowed to try to solve the problem rather than run away from it; trying to make an investment here may still be profitable.

Although all confusing questions must be places where our own cognitive algorithms are running skew to reality, this, again, doesn’t mean that we can immediately see and correct the skew; nor that it is compilable philosophy to insist in a very loud voice that a problem is solvable; nor that when a solution is presented we should immediately seize on it because the problem must be solvable and behold here is a solution. An important step in the method is to check whether there is any lingering sense of something that didn’t get resolved; whether we really feel less confused; whether it seems like we could write out the code for an AI that would be confused in the same way we were; whether there is any sense of dissatisfaction; whether we have merely chopped off all the interesting parts of the problem.

An earlier guide to some of the same ideas was the Reductionism Sequence.

tutorial: finishable philosophy applied to ‘free will’. (don’t forget to distinguish plausible wrong ways to do it on each step. is there a good example besides free will that can serve as a homework problem? maybe something actually unresolved like ‘Why does anything exist → why do some things exist more than others?’ with Tegmark Level IV as a considered, but not accepted answer.)


  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.