Solomonoff induction

Eliezer Yudkowsky5 Aug 2015 4:08 UTC

Solomonoff induction is an ideal answer to questions like “What probably comes next in the sequence 1, 1, 2, 3, 5, 8?” or “Given the last three years of visual data from this webcam, what will this robot probably see next?” or “Will the sun rise tomorrow?” Solomonoff induction requires infinite computing power, and is defined by taking every computable algorithm for giving a probability distribution over future data given past data, weighted by their algorithmic simplicity, and updating those weights by comparison to the actual data.

E.g., somewhere in the ideal Solomonoff distribution is an exact copy of you, right now, staring at a string of 1s and 0s and trying to predict what comes next—though this copy of you starts out with a very low weight in the mixture owing to its complexity. Since a copy of you is present in this mixture of computable predictors, we can prove a theorem about how well Solomonoff induction does compared to an exact copy of you; namely, Solomonoff induction commits only a bounded amount of error relative to you, or any other computable way of making predictions. Solomonoff induction is thus a kind of perfect or rational ideal for probabilistically predicting sequences, although it cannot be implemented in reality due to requiring infinite computing power. Still, considering Solomonoff induction can give us important insights into how non-ideal reasoning should operate in the real world.

Additional reading:

Eliezer Yudkowsky5 Aug 2015 4:08 UTC

Children:

Solomonoff induction: Intro Dialogue (Math 2)
An introduction to Solomonoff induction for the unfamiliar reader who isn’t bad at math

Parents:

Methodology of unbounded analysis
What we do and don’t understand how to do, using unlimited computing power, is a critical distinction and important frontier.
Inductive prior
Some states of pre-observation belief can learn quickly; others never learn anything. An “inductive prior” is of the former type.

M Yass 22 Feb 2016 5:23 UTC
Do we have(or need) any empirical evidence that algorithmic simplicity (space) is the ideal and ultimate absolute prior? If I reread the article carefully I see it doesn’t quite seem to advocate this, but I think it’s very easy for a learner to pick up that misconception from the way AIXI is generally discussed, the assumption that we somehow know that space complexity is the final answer, and I wonder if it should be ruled out or qualified here.

(I believe it’s easy to pick up that misconception because it happened to me. I later came to realize I probably wouldn’t bet at good odds that Space AIXI wont ever be dominated by a variant AIXI made with some other razor, the speed of generating environment turing machines, for instance, instead of space. Or how about inverse cyclomatic complexity or some more general measure of algorithm quality that humans thus far have lacked the breadth of mind to find, test or work with mathematically? Or maybe even just some other space complexity of some other machine model? Space complexity of TMs seems like extremely low-hanging fruit.)

I’m hoping to hear someone’s done the work of compiling some ridiculously extensive dataset of algorithms and their competitors and demonstrating that space complexity seemed more predictive of domination than any of the other general metrics we could think of. If this result had been found, nobody ever seems to cite it. Though I suppose we might not hear about it when so many people find it completely intuitive.
Gurkenglas Gurkenglas 23 Sep 2017 17:37 UTC
The Solomonoff prior is not computed from the space each hypothesis needs to compute its answer, it’s its program length, the space needed to pass the hypothesis to our UTM!

Any other form of induction one could try, like the space or time complexities of evaluating a hypothesis you described, or other machine models that Turing machines can describe, or human intuition, is included by Solomonoff induction as a hypothesis, and will replace it after finite error if it outperforms.

How would you even compute each program’s space/time complexity? Rice’s theorem says it cannot be done in general.