Advanced agent properties

(For the gen­eral con­cept of an agent, see stan­dard agent prop­er­ties.)

In­tro­duc­tion: ‘Ad­vanced’ as an in­for­mal prop­erty, or meta­syn­tac­tic placeholder

Suffi­ciently ad­vanced Ar­tifi­cial In­tel­li­gences” are the sub­jects of AI al­ign­ment the­ory; ma­chine in­tel­li­gences po­tent enough that:

  1. The safety paradigms for ad­vanced agents be­come rele­vant.

  2. Such agents can be de­ci­sive in the big-pic­ture scale of events.

Some ex­am­ple prop­er­ties that might make an agent suffi­ciently pow­er­ful for 1 and/​or 2:

Since there’s mul­ti­ple av­enues we can imag­ine for how an AI could be suffi­ciently pow­er­ful along var­i­ous di­men­sions, ‘ad­vanced agent’ doesn’t have a neat nec­es­sary-and-suffi­cient defi­ni­tion. Similarly, some of the ad­vanced agent prop­er­ties are eas­ier to for­mal­ize or pseud­ofor­mal­ize than oth­ers.

As an ex­am­ple: Cur­rent ma­chine learn­ing al­gorithms are nowhere near the point that they’d try to re­sist if some­body pressed the off-switch. That would hap­pen given, e.g.:

So the thresh­old at which you might need to start think­ing about ‘shut­down­abil­ity’ or ‘aborta­bil­ity’ or cor­rigi­bil­ity as it re­lates to hav­ing an off-switch, is ‘big-pic­ture strate­gic aware­ness’ plus ‘cross-do­main con­se­quen­tial­ism’. Th­ese two cog­ni­tive thresh­olds can thus be termed ‘ad­vanced agent prop­er­ties’.

The above rea­son­ing also sug­gests e.g. that gen­eral in­tel­li­gence is an ad­vanced agent prop­erty, be­cause a gen­eral abil­ity to learn new do­mains could lead the AI to un­der­stand that it has an off switch.

One rea­son to keep the term ‘ad­vanced’ on an in­for­mal ba­sis is that in an in­tu­itive sense we want it to mean “AI we need to take se­ri­ously” in a way in­de­pen­dent of par­tic­u­lar ar­chi­tec­tures or ac­com­plish­ments. To the philos­o­phy un­der­grad who ‘proves’ that AI can never be “truly in­tel­li­gent” be­cause it is “merely de­ter­minis­tic and me­chan­i­cal”, one pos­si­ble re­ply is, “Look, if it’s build­ing a Dyson Sphere, I don’t care if you define it as ‘in­tel­li­gent’ or not.” Any par­tic­u­lar ad­vanced agent prop­erty should be un­der­stood in a back­ground con­text of “If a com­puter pro­gram is do­ing X, it doesn’t mat­ter if we define that as ‘in­tel­li­gent’ or ‘gen­eral’ or even as ‘agenty’, what mat­ters is that it’s do­ing X.” Like­wise the no­tion of ‘suffi­ciently ad­vanced AI’ in gen­eral.

The goal of defin­ing ad­vanced agent prop­er­ties is not to have neat defi­ni­tions, but to cor­rectly pre­dict and carve at the nat­u­ral joints for which cog­ni­tive thresh­olds in AI de­vel­op­ment could lead to which real-world abil­ities, cor­re­spond­ing to which al­ign­ment is­sues.

An al­ign­ment is­sue may need to have been already been solved at the time an AI first ac­quires an ad­vanced agent prop­erty; the no­tion is not that we are defin­ing ob­ser­va­tional thresh­olds for so­ciety first need­ing to think about a prob­lem.

Sum­mary of some ad­vanced agent properties

Ab­solute-thresh­old prop­er­ties (those which re­flect cog­ni­tive thresh­olds ir­re­spec­tive of the hu­man po­si­tion on that same scale):

  • Con­se­quen­tial­ism, or choos­ing ac­tions/​poli­cies on the ba­sis of their ex­pected fu­ture consequences

  • Model­ing the con­di­tional re­la­tion­ship \(\mathbb P(Y|X)\) and se­lect­ing an \(X\) such that it leads to a high prob­a­bil­ity of \(Y\) or high quan­ti­ta­tive de­gree of \(Y,\) is ce­teris paribus a suffi­cient pre­con­di­tion for de­ploy­ing Con­ver­gent in­stru­men­tal strate­gies that lie within the effec­tively search­able range of \(X.\)

    • Note that se­lect­ing over a con­di­tional re­la­tion­ship is po­ten­tially a prop­erty of many in­ter­nal pro­cesses, not just the en­tire AI’s top-level main loop, if the con­di­tioned vari­able is be­ing pow­er­fully se­lected over a wide range.

  • Cross-do­main con­se­quen­tial­ism im­plies many differ­ent cog­ni­tive do­mains po­ten­tially ly­ing within the range of the \(X\) be­ing se­lected-on to achieve \(Y.\)

  • Try­ing to rule out par­tic­u­lar in­stru­men­tal strate­gies, in the pres­ence of in­creas­ingly pow­er­ful con­se­quen­tial­ism, would lead to the near­est un­blocked strat­egy form of patch re­sis­tance and sub­se­quent con­text-change dis­asters.

  • Big-pic­ture strate­gic aware­ness is a world-model that in­cludes strate­gi­cally im­por­tant gen­eral facts about the larger world, such as e.g. “I run on com­put­ing hard­ware” and “I stop run­ning if my hard­ware is switched off” and “there is such a thing as the In­ter­net and it con­nects to more com­put­ing hard­ware”.

  • Psy­cholog­i­cal mod­el­ing of other agents (not hu­mans per se) po­ten­tially leads to:

  • Ex­trap­o­lat­ing that its pro­gram­mers may pre­sent fu­ture ob­sta­cles to achiev­ing its goals

  • Try­ing to con­ceal facts about it­self from hu­man operators

  • Be­ing in­cen­tivized to en­gage in cog­ni­tive steganog­ra­phy.

  • Mind­crime if build­ing mod­els of re­flec­tive other agents, or it­self.

  • In­ter­nally mod­eled ad­ver­saries break­ing out of in­ter­nal sand­boxes.

  • Model­ing dis­tant su­per­in­tel­li­gences or other de­ci­sion-the­o­retic ad­ver­saries.

  • Sub­stan­tial ca­pa­bil­ity gains rel­a­tive to do­mains trained and ver­ified pre­vi­ously.

  • E.g. this is the qual­ify­ing prop­erty for many con­text-change dis­asters.

  • Gen­eral in­tel­li­gence is the most ob­vi­ous route to an AI ac­quiring many of the ca­pa­bil­ities above or be­low, es­pe­cially if those ca­pa­bil­ities were not ini­tially or de­liber­ately pro­grammed into the AI.

  • Self-im­prove­ment is an­other route that po­ten­tially leads to ca­pa­bil­ities not pre­vi­ously pre­sent. While some hy­pothe­ses say that self-im­prove­ment is likely to re­quire ba­sic gen­eral in­tel­li­gence, this is not a known fact and the two ad­vanced prop­er­ties are con­cep­tu­ally dis­tinct.

  • Pro­gram­ming or com­puter sci­ence ca­pa­bil­ities are a route po­ten­tially lead­ing to self-im­prove­ment, and may also en­able cog­ni­tive steganog­ra­phy.

  • Tur­ing-gen­eral cog­ni­tive el­e­ments (ca­pa­ble of rep­re­sent­ing large com­puter pro­grams), sub­ject to suffi­ciently strong end-to-end op­ti­miza­tion (whether by the AI or by hu­man-crafted clever al­gorithms run­ning on 10,000 GPUs), may give rise to crys­tal­lized agent-like pro­cesses within the AI.

  • E.g. nat­u­ral se­lec­tion, op­er­at­ing on chem­i­cal ma­chin­ery con­structible by DNA strings, op­ti­mized some DNA strings hard enough to spit out hu­mans.

  • Pivotal ma­te­rial ca­pa­bil­ities such as quickly self-repli­cat­ing in­fras­truc­ture, strong mas­tery of biol­ogy, or molec­u­lar nan­otech­nol­ogy.

  • What­ever thresh­old level of do­main-spe­cific en­g­ineer­ing acu­men suffices to de­velop those ca­pa­bil­ities, would there­fore also qual­ify as an ad­vanced-agent prop­erty.

Rel­a­tive-thresh­old ad­vanced agent prop­er­ties (those whose key lines are re­lated to var­i­ous hu­man lev­els of ca­pa­bil­ity):

  • Cog­ni­tive un­con­tain­abil­ity is when we can’t effec­tively imag­ine or search the AI’s space of policy op­tions (within a do­main); the AI can do things we didn’t think of (within a do­main).

  • Strong cog­ni­tive un­con­tain­abil­ity is when we don’t know all the rules (within a do­main) and might not rec­og­nize the AI’s solu­tion even if told about it in ad­vance, like some­body in the 11th cen­tury look­ing at the blueprint for a 21st-cen­tury air con­di­tioner. This may also im­ply that we can­not read­ily put low up­per bounds on the AI’s pos­si­ble de­gree of suc­cess.

  • Rich do­mains are more likely to have some rules or prop­er­ties un­known to us, and hence be strongly un­con­tain­able.

  • Al­most all real-world do­mains are rich.

  • Hu­man psy­chol­ogy is a rich do­main.

  • Su­per­hu­man perfor­mance in a rich do­main strongly im­plies cog­ni­tive un­con­tain­abil­ity be­cause of Vinge’s Prin­ci­ple.

  • Real­is­tic psy­cholog­i­cal mod­el­ing po­ten­tially leads to:

  • Guess­ing which re­sults and prop­er­ties the hu­man op­er­a­tors ex­pect to see, or would ar­rive at AI-de­sired be­liefs upon see­ing, and ar­rang­ing to ex­hibit those re­sults or prop­er­ties.

  • Psy­cholog­i­cally ma­nipu­lat­ing the op­er­a­tors or programmers

  • Psy­cholog­i­cally ma­nipu­lat­ing other hu­mans in the out­side world

  • More prob­a­ble mindcrime

  • (Note that an AI try­ing to de­velop re­al­is­tic psy­cholog­i­cal mod­els of hu­mans is, by im­pli­ca­tion, try­ing to de­velop in­ter­nal parts that can de­ploy all hu­man ca­pa­bil­ities.)

  • Rapid ca­pa­bil­ity gains rel­a­tive to hu­man abil­ities to re­act to them, or to learn about them and de­velop re­sponses to them, may cause more than one con­text dis­aster to hap­pen a time.

  • The abil­ity to use­fully scale onto more hard­ware with good re­turns on cog­ni­tive rein­vest­ment would po­ten­tially lead to such gains.

  • Hard­ware over­hang de­scribes a situ­a­tion where the ini­tial stages of a less de­vel­oped AI are boosted us­ing vast amounts of com­put­ing hard­ware that may then be used more effi­ciently later.

  • Limited AGIs may have ca­pa­bil­ity over­hangs if their limi­ta­tions break or are re­moved.

  • Strongly su­per­hu­man ca­pa­bil­ities in psy­cholog­i­cal or ma­te­rial do­mains could en­able an AI to win a com­pet­i­tive con­flict de­spite start­ing from a po­si­tion of great ma­te­rial dis­ad­van­tage.

  • E.g., much as a su­per­hu­man Go player might win against the world’s best hu­man Go player even with the hu­man given a two-stone ad­van­tage, a suffi­ciently pow­er­ful AI might talk its way out of an AI box de­spite re­stricted com­mu­ni­ca­tions chan­nels, eat the stock mar­ket in a month start­ing from $1000, win against the world’s com­bined mil­i­tary forces given a pro­tein syn­the­sizer and a 72-hour head start, etcetera.

  • Epistemic and in­stru­men­tal effi­ciency rel­a­tive to hu­man civ­i­liza­tion is a suffi­cient con­di­tion (though not nec­es­sary) for an AI to…

  • De­ploy at least any tac­tic a hu­man can think of.

  • An­ti­ci­pate any tac­tic a hu­man has thought of.

  • See the hu­man-visi­ble logic of a con­ver­gent in­stru­men­tal strat­egy.

  • Find any hu­manly visi­ble weird al­ter­na­tive to some hoped-for logic of co­op­er­a­tion.

  • Have any ad­vanced agent prop­erty for which a hu­man would qual­ify.

  • Gen­eral su­per­in­tel­li­gence would lead to strongly su­per­hu­man perfor­mance in many do­mains, hu­man-rel­a­tive effi­ciency in ev­ery do­main, and pos­ses­sion of all other listed ad­vanced-agent prop­er­ties.

  • Com­pound­ing re­turns on cog­ni­tive rein­vest­ment are the qual­ify­ing con­di­tion for an in­tel­li­gence ex­plo­sion that might ar­rive at su­per­in­tel­li­gence on a short timescale.

Dis­cus­sions of some ad­vanced agent properties

Hu­man psy­cholog­i­cal modeling

Suffi­ciently so­phis­ti­cated mod­els and pre­dic­tions of hu­man minds po­ten­tially leads to:

  • Get­ting suffi­ciently good at hu­man psy­chol­ogy to re­al­ize the hu­mans want/​ex­pect a par­tic­u­lar kind of be­hav­ior, and will mod­ify the AI’s prefer­ences or try to stop the AI’s growth if the hu­mans re­al­ize the AI will not en­gage in that type of be­hav­ior later. This cre­ates an in­stru­men­tal in­cen­tive for pro­gram­mer de­cep­tion or cog­ni­tive steganog­ra­phy.

  • Be­ing able to psy­cholog­i­cally and so­cially ma­nipu­late hu­mans in gen­eral, as a real-world ca­pa­bil­ity.

  • Be­ing at risk for mind­crime.

A be­hav­iorist AI is one with re­duced ca­pa­bil­ity in this do­main.

Cross-do­main, real-world consequentialism

Prob­a­bly re­quires gen­er­al­ity (see be­low). To grasp a con­cept like “If I es­cape from this com­puter by hack­ing my RAM ac­cesses to imi­tate a cel­l­phone sig­nal, I’ll be able to se­cretly es­cape onto the In­ter­net and have more com­put­ing power”, an agent needs to grasp the re­la­tion be­tween its in­ter­nal RAM ac­cesses, and a cer­tain kind of cel­l­phone sig­nal, and the fact that there are cel­l­phones out there in the world, and the cel­l­phones are con­nected to the In­ter­net, and that the In­ter­net has com­put­ing re­sources that will be use­ful to it, and that the In­ter­net also con­tains other non-AI agents that will try to stop it from ob­tain­ing those re­sources if the AI does so in a de­tectable way.

Con­trast­ing this to non-pri­mate an­i­mals where, e.g., a bee knows how to make a hive and a beaver knows how to make a dam, but nei­ther can look at the other and figure out how to build a stronger dam with hon­ey­comb struc­ture. Cur­rent, ‘nar­row’ AIs are like the bee or the beaver; they can play chess or Go, or even learn a va­ri­ety of Atari games by be­ing ex­posed to them with min­i­mal setup, but they can’t learn about RAM, cel­l­phones, the In­ter­net, In­ter­net se­cu­rity, or why be­ing run on more com­put­ers makes them smarter; and they can’t re­late all these do­mains to each other and do strate­gic rea­son­ing across them.

So com­pared to a bee or a beaver, one shot at de­scribing the po­tent ‘ad­vanced’ prop­erty would be cross-do­main real-world con­se­quen­tial­ism. To get to a de­sired Z, the AI can men­tally chain back­wards to mod­el­ing W, which causes X, which causes Y, which causes Z; even though W, X, Y, and Z are all in differ­ent do­mains and re­quire differ­ent bod­ies of knowl­edge to grasp.

Grasp­ing the big picture

Many dan­ger­ous-seem­ing con­ver­gent in­stru­men­tal strate­gies pass through what we might call a rough un­der­stand­ing of the ‘big pic­ture’; there’s a big en­vi­ron­ment out there, the pro­gram­mers have power over the AI, the pro­gram­mers can mod­ify the AI’s util­ity func­tion, fu­ture at­tain­ments of the AI’s goals are de­pen­dent on the AI’s con­tinued ex­is­tence with its cur­rent util­ity func­tion.

It might be pos­si­ble to de­velop a very rough grasp of this big­ger pic­ture, suffi­ciently so to mo­ti­vate in­stru­men­tal strate­gies, in ad­vance of be­ing able to model things like cel­l­phones and In­ter­net se­cu­rity. Thus, “roughly grasp­ing the big­ger pic­ture” may be worth con­cep­tu­ally dis­t­in­guish­ing from “be­ing good at do­ing con­se­quen­tial­ism across real-world things” or “hav­ing a de­tailed grasp on pro­gram­mer psy­chol­ogy”.

Pivotal ma­te­rial capabilities

An AI that can crack the pro­tein struc­ture pre­dic­tion prob­lem (which seems speed-up­pable by hu­man in­tel­li­gence); in­vert the model to solve the pro­tein de­sign prob­lem (which may se­lect on strong pre­dictable folds, rather than need­ing to pre­dict nat­u­ral folds); and solve en­g­ineer­ing prob­lems well enough to boot­strap to molec­u­lar nan­otech­nol­ogy; is already pos­sessed of po­ten­tially pivotal ca­pa­bil­ities re­gard­less of its other cog­ni­tive perfor­mance lev­els.

Other ma­te­rial do­mains be­sides nan­otech­nol­ogy might be pivotal. E.g., self-repli­cat­ing or­di­nary man­u­fac­tur­ing could po­ten­tially be pivotal given enough lead time; molec­u­lar nan­otech­nol­ogy is dis­t­in­guished by its small timescale of me­chan­i­cal op­er­a­tions and by the world con­tain­ing an in­finite stock of perfectly ma­chined spare parts (aka atoms). Any form of cog­ni­tive adept­ness that can lead up to rapid in­fras­truc­ture or other ways of quickly gain­ing a de­ci­sive real-world tech­nolog­i­cal ad­van­tage would qual­ify.

Rapid ca­pa­bil­ity gain

If the AI’s thought pro­cesses and al­gorithms scale well, and it’s run­ning on re­sources much smaller than those which hu­mans can ob­tain for it, or the AI has a grasp on In­ter­net se­cu­rity suffi­cient to ob­tain its own com­put­ing power on a much larger scale, then this po­ten­tially im­plies rapid ca­pa­bil­ity gain and as­so­ci­ated con­text changes. Similarly if the hu­mans pro­gram­ming the AI are push­ing for­ward the effi­ciency of the al­gorithms along a rel­a­tively rapid curve.

In other words, if an AI is cur­rently be­ing im­proved-on swiftly, or if it has im­proved sig­nifi­cantly as more hard­ware is added and has the po­ten­tial ca­pac­ity for or­ders of mag­ni­tude more com­put­ing power to be added, then we can po­ten­tially ex­pect rapid ca­pa­bil­ity gains in the fu­ture. This makes con­text dis­asters more likely and is a good rea­son to start fu­ture-proofing the safety prop­er­ties early on.

Cog­ni­tive uncontainability

On com­plex tractable prob­lems, es­pe­cially those that in­volve real-world rich prob­lems, a hu­man will not be able to cog­ni­tively ‘con­tain’ the space of pos­si­bil­ities searched by an ad­vanced agent; the agent will con­sider some pos­si­bil­ities (or classes of pos­si­bil­ities) that the hu­man did not think of.

The key premise is the ‘rich­ness’ of the prob­lem space, i.e., there is a fit­ness land­scape on which adding more com­put­ing power will yield im­prove­ments (large or small) rel­a­tive to the cur­rent best solu­tion. Tic-tac-toe is not a rich land­scape be­cause it is fully ex­plorable (un­less we are con­sid­er­ing the real-world prob­lem “tic-tac-toe against a hu­man player” who might be sub­orn­able, dis­tractable, etc.) A com­pu­ta­tion­ally in­tractable prob­lem whose fit­ness land­scape looks like a com­pu­ta­tion­ally in­ac­cessible peak sur­rounded by a perfectly flat valley is also not ‘rich’ in this sense, and an ad­vanced agent might not be able to achieve a rele­vantly bet­ter out­come than a hu­man.

The ‘cog­ni­tive un­con­tain­abil­ity’ term in the defi­ni­tion is meant to im­ply:

  • Vingean un­pre­dictabil­ity.

  • Creativity that goes out­side all but the most ab­stract boxes we imag­ine (on rich prob­lems).

  • The ex­pec­ta­tion that we will be sur­prised by the strate­gies the su­per­in­tel­li­gence comes up with be­cause its best solu­tion was one we didn’t con­sider.

Par­tic­u­larly sur­pris­ing solu­tions might be yielded if the su­per­in­tel­li­gence has ac­quired do­main knowl­edge we lack. In this case the agent’s strat­egy search might go out­side causal events we know how to model, and the solu­tion might be one that we wouldn’t have rec­og­nized in ad­vance as a solu­tion. This is Strong cog­ni­tive un­con­tain­abil­ity.

In in­tu­itive terms, this is meant to re­flect, e.g., “What would have hap­pened if the 10th cen­tury had tried to use their un­der­stand­ing of the world and their own think­ing abil­ities to up­per-bound the tech­nolog­i­cal ca­pa­bil­ities of the 20th cen­tury?”

Other properties

(Work in progress) fill out

  • generality

  • cross-do­main consequentialism

  • learn­ing of non-pre­pro­grammed domains

    • learn­ing of hu­man-un­known facts

  • Tur­ing-com­plete fact and policy learning

  • dan­ger­ous domains

  • hu­man modeling

    • so­cial manipulation

    • re­al­iza­tion of pro­gram­mer de­cep­tion incentive

    • an­ti­ci­pat­ing hu­man strate­gic responses

  • rapid infrastructure

  • potential

  • self-improvement

  • sup­pressed potential

  • epistemic efficiency

  • in­stru­men­tal efficiency

  • cog­ni­tive uncontainability

  • im­prove­ment be­yond well-tested phase (from any source of im­prove­ment)

  • self-modification

  • code inspection

  • code modification

  • con­se­quen­tial­ist programming

    • cog­ni­tive programming

  • cog­ni­tive ca­pa­bil­ity goals (be­ing pur­sued effec­tively)

  • speed sur­pass­ing hu­man re­ac­tion times in some in­ter­est­ing domain

  • so­cially, or­ga­ni­za­tion­ally, in­di­vi­d­u­ally, materially

write out a set of fi­nal dan­ger­ous abil­ities use/​cases and then link up the cog­ni­tive abil­ities with which po­ten­tially dan­ger­ous sce­nar­ios they cre­ate.


  • Big-picture strategic awareness

    We start en­coun­ter­ing new AI al­ign­ment is­sues at the point where a ma­chine in­tel­li­gence rec­og­nizes the ex­is­tence of a real world, the ex­is­tence of pro­gram­mers, and how these re­late to its goals.

  • Superintelligent

    A “su­per­in­tel­li­gence” is strongly su­per­hu­man (strictly higher-perform­ing than any and all hu­mans) on ev­ery cog­ni­tive prob­lem.

  • Intelligence explosion

    What hap­pens if a self-im­prov­ing AI gets to the point where each amount x of self-im­prove­ment trig­gers >x fur­ther self-im­prove­ment, and it stays that way for a while.

  • Artificial General Intelligence

    An AI which has the same kind of “sig­nifi­cantly more gen­eral” in­tel­li­gence that hu­mans have com­pared to chim­panzees; it can learn new do­mains, like we can.

  • Advanced nonagent

    Hy­po­thet­i­cally, cog­ni­tively pow­er­ful pro­grams that don’t fol­low the loop of “ob­serve, learn, model the con­se­quences, act, ob­serve re­sults” that a stan­dard “agent” would.

  • Epistemic and instrumental efficiency

    An effi­cient agent never makes a mis­take you can pre­dict. You can never suc­cess­fully pre­dict a di­rec­tional bias in its es­ti­mates.

  • Standard agent properties

    What’s a Stan­dard Agent, and what can it do?

  • Real-world domain

    Some AIs play chess, some AIs play Go, some AIs drive cars. Th­ese differ­ent ‘do­mains’ pre­sent differ­ent op­tions. All of re­al­ity, in all its messy en­tan­gle­ment, is the ‘real-world do­main’.

  • Sufficiently advanced Artificial Intelligence

    ‘Suffi­ciently ad­vanced Ar­tifi­cial In­tel­li­gences’ are AIs with enough ‘ad­vanced agent prop­er­ties’ that we start need­ing to do ‘AI al­ign­ment’ to them.

  • Infrahuman, par-human, superhuman, efficient, optimal

    A cat­e­go­riza­tion of AI abil­ity lev­els rel­a­tive to hu­man, with some gotchas in the or­der­ing. E.g., in sim­ple do­mains where hu­mans can play op­ti­mally, op­ti­mal play is not su­per­hu­man.

  • General intelligence

    Com­pared to chim­panzees, hu­mans seem to be able to learn a much wider va­ri­ety of do­mains. We have ‘sig­nifi­cantly more gen­er­ally ap­pli­ca­ble’ cog­ni­tive abil­ities, aka ‘more gen­eral in­tel­li­gence’.

  • Corporations vs. superintelligences

    Cor­po­ra­tions have rel­a­tively few of the ad­vanced-agent prop­er­ties that would al­low one mis­take in al­ign­ing a cor­po­ra­tion to im­me­di­ately kill all hu­mans and turn the fu­ture light cone into pa­per­clips.

  • Cognitive uncontainability

    ‘Cog­ni­tive un­con­tain­abil­ity’ is when we can’t hold all of an agent’s pos­si­bil­ities in­side our own minds.

  • Vingean uncertainty

    You can’t pre­dict the ex­act ac­tions of an agent smarter than you—so is there any­thing you can say about them?

  • Consequentialist cognition

    The cog­ni­tive abil­ity to fore­see the con­se­quences of ac­tions, pre­fer some out­comes to oth­ers, and out­put ac­tions lead­ing to the preferred out­comes.


  • Theory of (advanced) agents

    One of the re­search sub­prob­lems of build­ing pow­er­ful nice AIs, is the the­ory of (suffi­ciently ad­vanced) minds in gen­eral.