Theory of (advanced) agents

Many issues in AI alignment have dependencies on what we think we can factually say about the general design space of cognitively powerful agents, or on which background assumptions yield which implications about advanced agents. E.g., the Orthogonality Thesis is a claim about the general design space of powerful AIs. The design space of advanced agents is very wide, and only very weak statements seem likely to be true about the whole design space; but we can still try to say ‘If X then Y’ and refute claims about ‘No need for if-X, Y happens anyway!’


  • Instrumental convergence

    Some strategies can help achieve most possible simple goals. E.g., acquiring more computing power or more material resources. By default, unless averted, we can expect advanced AIs to do that.

  • Orthogonality Thesis

    Will smart AIs automatically become benevolent, or automatically become hostile? Or do different AI designs imply different goals?

  • Advanced agent properties

    How smart does a machine intelligence need to be, for its niceness to become an issue? “Advanced” is a broad term to cover cognitive abilities such that we’d need to start considering AI alignment.


  • AI alignment

    The great civilizational problem of creating artificially intelligent computer systems such that running them is a good idea.