Regarding limitations of the AIXI, I'd like to second Laurent on the point that the completely free choice of reference machine is a bit unsatisfying.
If I haven't missed anything, this means, among other things, that if we have two strings:
a = 111 b = 100110101010010111010101010101 001111010101
we can not objectively say that a is simpler than b, since according to some obscure languages (reference machines) b will actually be simpler than a.
As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?
This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?
Yes absolutely.
*If* we choose a "reference class of models" (e.g., Turing machines), then we can choose the smallest reference machine.
I thought about another option though, just a while back:
Instead (but still considering the TMs class), why not choose the reference UTM that orders TMs in the exact same order as the "natural" order, which is to grow TMs depending on the number of states, of transitions, and some lexicographical order?
(my first problem then was that there was too many TMs of same complexity)
Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.
Also, Hutter has some discussions on these matters in his 2005 book and in :
"A Philosophical Treatise of Universal Induction"
http://www.hutter1.net/official/bib.htm#uiphil
On last resort, we can set for a consensus on a "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.
Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
This is good news. I hadn't actually realized that the structure of the TM is objective, and that some structures are arguably simpler than other. The smallest UTM found seems to have 22 states, but it doesn't seem to be proven minimal. http://en.wikipedia.org/wiki/Universal_Turing_machine#Smallest_machines
Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.
This is probably the article you mean http://arxiv.org/abs/cs/0608095. It sounds interesting, I will definitely read it.
Unfortunately it seems that they conclude it is not really possible to find a simplest, universal computer:
"Moreover, we show that the reason for failure has a clear and interesting physical interpretation, suggesting that every other conceivable attempt to get rid of those additive constants must fail in principle, too."
Yes, as I see it now, there are two "possible" paths to finding the ultimate reference machine. One is to try to order the UTM's "mathematically", but perhaps in a different way than Muller tried. The other, more philosophical approach, would be to find some structure for "turing-complete computing structures" in general, and apply a similar argument to the one you gave above on Turing machines.
On last resort, we can set for a consensus on a "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.
Hehehe, I hope you're right :) Seriously though, I totally agree about that the philosophical solution would be the nicest by far. The ultimate insight on what computation is... I wonder if it's possible.
I think we can say that, in principle, the best UTM would be the one
which best predicts the environment of the agent. In other words, the
UTM with the most built-in knowledge. The update process is a descent
towards minimal error; in other words, inductive learning can be seen
as a search for a better prior. The posterior of a Bayesian update can
always be treated as a new prior for the next Bayesian update.
The update to a universal distribution remains universal, so we can
even regard the result as a new universal Turing machine, although it
is not explicitly represented as such.
So, I strongly sympathize with the idea that there's no single best
prior. However, that doesn't mean we can't find nice desirable
properties in the directions you guys are discussing. The equivalence
class "Turing Complete" is a very weak one, which doesn't require very
many properties to be preserved. Different models of computation can
change the complexity classes of specific problems, for example.
One property I mentioned to Tom is the ratio of always-meaningful
programs (total functions) to sometimes-meaningless ones (partial
functions). IE, we ask how easy it is to get into an infinite loop
which produces no output. This number has been called "Omega".
It's not possible to have a language which is always-meaningful while
being Turing complete, so we can work at this from "both
directions"... augmenting always-meaningful languages to increase
expressiveness, and partially restricting Turing-complete languages to
be more meaningful. However, there is no perfect middle point-- we can
only get closer in each direction.
--Abram
--
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic
FYI, Tim Tyler is also interested in this question.
I think we can say that, in principle, the best UTM would be the one
which best predicts the environment of the agent. In other words, the
UTM with the most built-in knowledge. The update process is a descent
towards minimal error; in other words, inductive learning can be seen
as a search for a better prior. The posterior of a Bayesian update can
always be treated as a new prior for the next Bayesian update.
The update to a universal distribution remains universal, so we can
even regard the result as a new universal Turing machine, although it
is not explicitly represented as such.
Hi, yes - I do have a page about the issue: http://matchingpennies.com/the_one_true_razor/
I also think it is not a huge deal. The "just use FORTRAN-77" is
reasonable. If you want to give your
machine a head start in the real world, you tell it all the things you
know - e.g. audio/video codecs.
Maybe the smallest possible Turing-complete (save for infinite memory) computers in our universe?
The problem is that they don't look so "simple"... But maybe that's because of our biased macro-view human perspective.
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
Hi Tom and Laurent,
I agree with Tom, their solution is not quite satisfying. I believe Marcus is well aware of this, and I personally don't think Marcus accepts their argument as well, as whenever he mentions (has to mention) this issue in his papers, he uses "universally agreed-upon enumeration" to "get rid of" arbitrariness.
As far as I know, there isn't a completely satisfying solution yet (I don't believe any of my colleague knows one). Nevertheless, the arbitrariness in choosing universal Turing machine only matters when we examine the complexity of a particular object, which we seldom do.
Hello Guys,
I just finished reading Shane’s paper on the negative results on making Solomonoff induction practical, and a question pop out:
What is the precise linkage between the theory of Solomonoff induction and the reality?
Take probability theory for example: The only linkage between the theory of probability and the reality is ‘the fundamental interpretative hypothesis’, which states ‘that events with zero or low probability are unlikely to occur’ (Shafer and Vovk, ‘Probability and finance: It’s only a game’, pp.5) This makes clear that conclusions drawn from probabilistic inference about the reality make sense only if we accept this hypothesis, and as a indirect consequence, provides theoretical basis for the generalization results in the statistical learning context. (E.g., A is better than B on a set of i.i.d. examples => A is likely to be better than B on all cases otherwise something with small probability will happen.)
When looking at the theory of Solomonoff induction (or the RL extension AIXI), I found such clearly stated linkage is somehow missing. Shane’s negative results show that we cannot say much mathematically about the performance of a complex predictor. However, in the original paper, it is suggested that the creation of AGI would be more or less an experimental science. However, I also found this point problematic. One of the reason is that almost all experimental study requires the comparison between two agents A and B. And if A and B are complex algorithmic objects, and if we adopt Shane’s definition that a predictor is universal if it ‘eventually’ correctly predict a sequence, then there is practically no way to draw any conclusion by looking at the performance of the agents on finite data. As a result, to make an experimental study useful, we have to answer precisely the following question:
What is the fundamental hypothesis we must accept to say that ‘Predictor A is better than B on the given set of finite strings => A is likely to be better than B on all sequences’?
In other word, we must state very clearly under what hypothesis should we belief that the experimental result will generalize, before we start to do any experiment. I don’t have an answer to this problem, yet I think the answer is necessary if we want to make the universal theory practically relevant.
Regards,
Sun Yi
--
Hello Guys,
I just finished reading Shane’s paper on the negative results on making Solomonoff induction practical,
and a question pop out:
What is the precise linkage between the theory of Solomonoff induction and the reality?
Take probability theory for example: The only linkage between the theory of probability and the reality is ‘the fundamental interpretative hypothesis’, which states ‘that events with zero or low probability are unlikely to occur’ (Shafer and Vovk, ‘Probability and finance: It’s only a game’, pp.5) This makes clear that conclusions drawn from probabilistic inference about the reality make sense only if we accept this hypothesis, and as a indirect consequence, provides theoretical basis for the generalization results in the statistical learning context. (E.g., A is better than B on a set of i.i.d. examples => A is likely to be better than B on all cases otherwise something with small probability will happen.)
When looking at the theory of Solomonoff induction (or the RL extension AIXI), I found such clearly stated linkage is somehow missing. Shane’s negative results show that we cannot say much mathematically about the performance of a complex predictor. However, in the original paper, it is suggested that the creation of AGI would be more or less an experimental science.
However, I also found this point problematic. One of the reason is that almost all experimental study requires the comparison between two agents A and B. And if A and B are complex algorithmic objects, and if we adopt Shane’s definition that a predictor is universal if it ‘eventually’ correctly predict a sequence, then there is practically no way to draw any conclusion by looking at the performance of the agents on finite data. As a result, to make an experimental study useful, we have to answer precisely the following question:
What is the fundamental hypothesis we must accept to say that ‘Predictor A is better than B on the given set of finite strings => A is likely to be better than B on all sequences’?
In other word, we must state very clearly under what hypothesis should we belief that the experimental result will generalize, before we start to do any experiment. I don’t have an answer to this problem, yet I think the answer is necessary if we want to make the universal theory practically relevant.
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
> It is also clear that computable agents will perform much poorer than AIXI in the real world (if that makes sense to consider AIXI in the real world).
I'll try to contradict it, and you can tell me if I'm wrong.
My understanding is that AIXI is provably optimal (given that we can agree on the Turing machine used to express plans, accept the specific formulation of the Solomonoff predictor as optimal, and agree on a horizon) across all possible RL problems. But the real world, by which I assume you mean the physical world, will not present a uniform sampling of all possible RL problems to an agent. In fact, it is quite possible that the real world will only present an infinitessimal subset of all possible RL problems to the agent. In this case, an agent may be able to far outperform AIXI on physical world RL tasks, yet still may be considered "general" in the sense that it can do anything that we ourselves are likely to do or need done.
Hello,
My name is SUN Yi, Yi is the first name: )
From my point of view:
- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )
If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.
- My problem is with the computable agent. Shane’s result shows that even if we make the assumption that the world is simple in the Kolmogorov sense, we still cannot guarantee the performance of the agent. Moreover, beyond certain point we cannot even mathematically reason about the computable agent.
- As for the comparison of two agents, I have the intuition that it is not decidable (no proof yet). Anyway, I wrote the following Lemma, basically restate Shane’s result. However, it shows that deciding if a sequence is predicted by a given predictor is very hard.
Regards,
Sun Yi
From: magic...@googlegroups.com [mailto:magic...@googlegroups.com] On Behalf Of Laurent
Sent: Tuesday, August 23, 2011 21:51
To: magic...@googlegroups.com
--
Laurent,
Thanks for your response. I've added a couple of notes below.____________________- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world
>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't relevant.
- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.
>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.
>>It would be great to be able to make such a comparison in practice.
- last but not least, I was considering *real-time* real-world agents, those which can only do a very limited number of computations per interaction step, whereas AIXI can do an infinite number of computations in the same time. For such real agents, learning can only be extremely slow, since this requires time.
>>Hardly a fair comparison! :)
So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.
Brandon
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
Brandon,One point below.On Wed, Aug 24, 2011 at 2:24 AM, Rohrer, Brandon R <brr...@sandia.gov> wrote:
Laurent,
Thanks for your response. I've added a couple of notes below.____________________- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world
>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't relevant.
- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.
>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them.(I should provide a reference for the equivalence, but I'm not sure where it is proven...)
My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them.(I should provide a reference for the equivalence, but I'm not sure where it is proven...)
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.
--