Distances between cognitive domains

In the context of AI alignment, we may care a lot about the degree to which competence in two different cognitive domains is separable, or alternatively highly tangled, relative to the class of algorithms reasoning about them.

  • Calling X and Y ‘separate domains’ is asserting at least one of “It’s possible to learn to reason well about X without needing to know about Y” or “It’s possible to learn to reason well about Y without necessarily knowing how to reason well about X”.

  • Calling X a distinct domain within a set of domains Z relative to a background domain W would say that: taking for granted other background algorithms and knowledge W that the agent can use to reason about any domain in Z; it’s possible to reason well about the domain X using ideas, methods, and knowledge that are mostly related to each other and not tangled up with ideas from non-X domains within Z.

For example: If the domains X and Y are ‘blue cars’ and ‘red cars’, then it seems unlikely that X and Y would be well-separated domains because an agent that knows how to reason well about blue cars is almost surely extremely close to being an agent that can reason well about red cars, in the sense that:

  • For almost everything we want to do or predict about blue cars, the simplest or fastest or easiest-to-discover way of manipulating or predicting blue cars in this way, will also work for manipulating or predicting red cars. This is the sense in which the blue-car and red-car domains are ‘naturally’ very close.

  • For most natural agent designs, the state or specification of an agent that can reason about blue cars, is probably extremely close to the state or specification of an agent that can reason about red cars.

  • The only reason why an agent that reasons well about blue cars would be hard to convert to an agent that reasons well about red cars, would be if there were specific extra elements added to the agent’s design to prevent it from reasoning well about red cars. In this case, the design distance is increased by whatever further modifications are required to untangle and delete the anti-red-car-learning inhibitions; but no further than that, the ‘blue car’ and ‘red car’ domains are naturally close.

  • An agent that has already learned how to reason well about blue cars probably requires only a tiny amount of extra knowledge or learning, if any, to reason well about red cars as well. (Again, unless the agent contains specific added design elements to make it reason poorly about red cars.)

In more complicated cases, which domains are truly close or far from each other, or can be compactly separated out, is a theory-laden assertion. Few people are likely to disagree that blue cars and red cars are very close domains (if they’re not specifically trying to be disagreeable). Researchers are more likely to disagree in their predictions about:

  • Whether (by default and ceteris paribus and assuming designs not containing extra elements to make them behave differently etcetera) an AI that is good at designing cars is also likely to be very close to learning how to design airplanes.

  • Whether (assuming straightforward designs) the first AGI to obtain superhuman engineering ability for designing cars including software, would probably be at least par-human in the domain of inventing new mathematical proofs.

  • Whether (assuming straightforward designs) an AGI that has superhuman engineering ability for designing cars including software, necessarily needs to think about most of the facts and ideas that would be required to understand and manipulate human psychology.

Relation to ‘general intelligence’

A key parameter in some such disagreements may be how much credit the speaker gives to the notion of general intelligence. Specifically, to what extent the natural or the most straightforward approach to get par-human or superhuman performance in critical domains, is to take relatively general learning algorithms and deploy them on learning the domain as a special case.

If you think that it would take a weird or twisted design to build a mind that was superhumanly good at designing cars including writing their software, without using general algorithms and methods that could with minor or little adaptation stare at mathematical proof problems and figure them out, then you think ‘design cars’ and ‘prove theorems’ and many other domains are in some sense naturally not all that separated. Which (arguendo) is why humans are so much better than chimpanzees at so many apparently different cognitive domains: the same competency, general intelligence, solves all of them.

If on the other hand you are more inspired by the way that superhuman chess AIs can’t play Go and AlphaGo can’t drive a car, you may think that humans using general intelligence on everything is just an instance of us having a single hammer and trying to treat everything as a nail; and predict that specialized mind designs that were superhuman engineers, but very far in mind design space from being a kind of mind that could prove Fermat’s Last Theorem, would be a more natural or efficient way to create a superhuman engineer.

See the entry on General intelligence for further discussion.

Parents:

  • Cognitive domain

    An allegedly compact unit of knowledge, such that ideas inside the unit interact mainly with each other and less with ideas in other domains.