Optimization daemons

If you sub­ject a dy­namic sys­tem to a large amount of op­ti­miza­tion pres­sure, it can turn into an op­ti­mizer or even an in­tel­li­gence. The clas­sic ex­am­ple would be how nat­u­ral se­lec­tion, in the course of ex­ten­sively op­ti­miz­ing DNA to con­struct or­ganisms that repli­cated the DNA, in one case pushed hard enough that the DNA came to spec­ify a cog­ni­tive sys­tem ca­pa­ble of do­ing its own con­se­quen­tial­ist op­ti­miza­tion. Ini­tially, these cog­ni­tive op­ti­miz­ers pur­sued goals that cor­re­lated well with nat­u­ral se­lec­tion’s op­ti­miza­tion tar­get of re­pro­duc­tive fit­ness, which is how these crys­tal­lized op­ti­miz­ers had origi­nally come to be se­lected into ex­is­tence. How­ever, fur­ther op­ti­miza­tion of these ‘brain’ pro­tein chunks caused them to be­gin to cre­ate and share cog­ni­tive con­tent among them­selves, af­ter which such rapid ca­pa­bil­ity gain oc­curred that a con­text change took place and the brains’ pur­suit of their in­ter­nal goals no longer cor­re­lated re­li­ably with DNA repli­ca­tion.

As much as this was, from a hu­man stand­point, a won­der­ful thing to have hap­pened, it wasn’t such a great thing from the stand­point of in­clu­sive ge­netic fit­ness of DNA or just hav­ing sta­ble, re­li­able, well-un­der­stood op­ti­miza­tion go­ing on. In the case of AGIs de­ploy­ing pow­er­ful in­ter­nal and ex­ter­nal op­ti­miza­tion pres­sures, we’d very much like to not have that op­ti­miza­tion de­liber­ately or ac­ci­den­tally crys­tal­lize into new modes of op­ti­miza­tion, es­pe­cially if this breaks goal al­ign­ment with the pre­vi­ous sys­tem or breaks other safety prop­er­ties. (You might need to stare at the Orthog­o­nal­ity Th­e­sis un­til it be­comes in­tu­itive that, even though crys­tal­liz­ing dae­mons from nat­u­ral se­lec­tion pro­duced crea­tures that were more hu­mane than nat­u­ral se­lec­tion, this doesn’t mean that crys­tal­liza­tion from an AGI’s op­ti­miza­tion would have a sig­nifi­cant prob­a­bil­ity of pro­duc­ing some­thing hu­mane.)

When heavy op­ti­miza­tion pres­sure on a sys­tem crys­tal­lizes it into an op­ti­mizer—es­pe­cially one that’s pow­er­ful, or more pow­er­ful than the pre­vi­ous sys­tem, or mis­al­igned with the pre­vi­ous sys­tem—we could term the crys­tal­lized op­ti­mizer a “dae­mon” of the pre­vi­ous sys­tem. Thus, un­der this ter­minol­ogy, hu­mans would be dae­mons of nat­u­ral se­lec­tion. If an AGI, af­ter heav­ily op­ti­miz­ing some in­ter­nal sys­tem, was sud­denly taken over by an erupt­ing dae­mon that cog­ni­tively wanted to max­i­mize some­thing that had pre­vi­ously cor­re­lated with the amount of available RAM, we would say this was a crys­tal­lized dae­mon of what­ever kind of op­ti­miza­tion that AGI was ap­ply­ing to its in­ter­nal sys­tem.

This pre­sents an AGI safety challenge. In par­tic­u­lar, we’d want at least one of the fol­low­ing things to be true any­where that any kind of op­ti­miza­tion pres­sure was be­ing ap­plied:

  • The op­ti­miza­tion pres­sure is (know­ably and re­li­ably) too weak to cre­ate dae­mons. (Seem­ingly true of all cur­rent sys­tems, mod­ulo the ‘know­ably’ part.)

  • The sub­ject of op­ti­miza­tion is not Tur­ing-com­plete or oth­er­wise pro­gram­mat­i­cally gen­eral and the re­stricted solu­tion space can­not pos­si­bly con­tain dae­mons no mat­ter how much op­ti­miza­tion pres­sure is ap­plied to it. (3-layer non-re­cur­rent neu­ral net­works con­tain­ing less than a trillion neu­rons will prob­a­bly not erupt dae­mons no mat­ter how hard you op­ti­mize them.)

  • The AI has a suffi­cient grasp on the con­cept of op­ti­miza­tion and the prob­lem of dae­mons to re­li­ably avoid cre­at­ing mechanisms out­side the AI that do cog­ni­tive rea­son­ing. (Note that if some pred­i­cate is added to ex­clude a par­tic­u­lar type of dae­mon, this po­ten­tially runs into the near­est un­blocked neigh­bor prob­lem.)

  • The AI only cre­ates cog­ni­tive sub­agents which share all the goals and safety prop­er­ties of the origi­nal agent. E.g. if the origi­nal AI is low-im­pact, softly op­ti­miz­ing, abortable, and tar­geted on perform­ing Tasks, it only cre­ates cog­ni­tive sys­tems that are low-im­pact, don’t op­ti­mize too hard in con­junc­tion with the origi­nal AI, abortable by the same shut­down but­ton, and tar­geted on perform­ing the cur­rent task.


  • Advanced safety

    An agent is re­ally safe when it has the ca­pac­ity to do any­thing, but chooses to do what the pro­gram­mer wants.