Development phase unpredictable

Sev­eral pro­posed prob­lems in ad­vanced safety are alleged to be difficult be­cause they de­pend on some prop­erty of a ma­ture agent that is allegedly hard to pre­dict in ad­vance at the time we are de­sign­ing, teach­ing, or test­ing the agent. We say that such prop­er­ties are allegedly ‘de­vel­op­ment phase un­pre­dictable’. For ex­am­ple, the Un­fore­seen Max­i­mums prob­lem arises when we can’t search a rich solu­tion space as widely as an ad­vanced agent, mak­ing it de­vel­op­ment-phase un­pre­dictable which real-world strat­egy or out­come state will max­i­mize some for­mal util­ity func­tion.


  • Unforeseen maximum

    When you tell AI to pro­duce world peace and it kills ev­ery­one. (Okay, some SF writ­ers saw that one com­ing.)


  • AI alignment

    The great civ­i­liza­tional prob­lem of cre­at­ing ar­tifi­cially in­tel­li­gent com­puter sys­tems such that run­ning them is a good idea.