Regarding limitations of the AIXI, I'd like to second Laurent on the point that the completely free choice of reference machine is a bit unsatisfying.
If I haven't missed anything, this means, among other things, that if we have two strings:
a = 111 b = 100110101010010111010101010101 001111010101
we can not objectively say that a is simpler than b, since according to some obscure languages (reference machines) b will actually be simpler than a.
As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?
This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?
Yes absolutely.
*If* we choose a "reference class of models" (e.g., Turing machines), then we can choose the smallest reference machine.
I thought about another option though, just a while back:
Instead (but still considering the TMs class), why not choose the reference UTM that orders TMs in the exact same order as the "natural" order, which is to grow TMs depending on the number of states, of transitions, and some lexicographical order?
(my first problem then was that there was too many TMs of same complexity)
Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.
Also, Hutter has some discussions on these matters in his 2005 book and in :
"A Philosophical Treatise of Universal Induction"
http://www.hutter1.net/official/bib.htm#uiphil
On last resort, we can set for a consensus on a "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.
Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
This is good news. I hadn't actually realized that the structure of the TM is objective, and that some structures are arguably simpler than other. The smallest UTM found seems to have 22 states, but it doesn't seem to be proven minimal. http://en.wikipedia.org/wiki/Universal_Turing_machine#Smallest_machines
Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.
This is probably the article you mean http://arxiv.org/abs/cs/0608095. It sounds interesting, I will definitely read it.
Unfortunately it seems that they conclude it is not really possible to find a simplest, universal computer:
"Moreover, we show that the reason for failure has a clear and interesting physical interpretation, suggesting that every other conceivable attempt to get rid of those additive constants must fail in principle, too."
Yes, as I see it now, there are two "possible" paths to finding the ultimate reference machine. One is to try to order the UTM's "mathematically", but perhaps in a different way than Muller tried. The other, more philosophical approach, would be to find some structure for "turing-complete computing structures" in general, and apply a similar argument to the one you gave above on Turing machines.
On last resort, we can set for a consensus on a "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.
Hehehe, I hope you're right :) Seriously though, I totally agree about that the philosophical solution would be the nicest by far. The ultimate insight on what computation is... I wonder if it's possible.
I think we can say that, in principle, the best UTM would be the one
which best predicts the environment of the agent. In other words, the
UTM with the most built-in knowledge. The update process is a descent
towards minimal error; in other words, inductive learning can be seen
as a search for a better prior. The posterior of a Bayesian update can
always be treated as a new prior for the next Bayesian update.
The update to a universal distribution remains universal, so we can
even regard the result as a new universal Turing machine, although it
is not explicitly represented as such.
So, I strongly sympathize with the idea that there's no single best
prior. However, that doesn't mean we can't find nice desirable
properties in the directions you guys are discussing. The equivalence
class "Turing Complete" is a very weak one, which doesn't require very
many properties to be preserved. Different models of computation can
change the complexity classes of specific problems, for example.
One property I mentioned to Tom is the ratio of always-meaningful
programs (total functions) to sometimes-meaningless ones (partial
functions). IE, we ask how easy it is to get into an infinite loop
which produces no output. This number has been called "Omega".
It's not possible to have a language which is always-meaningful while
being Turing complete, so we can work at this from "both
directions"... augmenting always-meaningful languages to increase
expressiveness, and partially restricting Turing-complete languages to
be more meaningful. However, there is no perfect middle point-- we can
only get closer in each direction.
--Abram
--
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic
FYI, Tim Tyler is also interested in this question.
I think we can say that, in principle, the best UTM would be the one
which best predicts the environment of the agent. In other words, the
UTM with the most built-in knowledge. The update process is a descent
towards minimal error; in other words, inductive learning can be seen
as a search for a better prior. The posterior of a Bayesian update can
always be treated as a new prior for the next Bayesian update.
The update to a universal distribution remains universal, so we can
even regard the result as a new universal Turing machine, although it
is not explicitly represented as such.
Hi, yes - I do have a page about the issue: http://matchingpennies.com/the_one_true_razor/
I also think it is not a huge deal. The "just use FORTRAN-77" is
reasonable. If you want to give your
machine a head start in the real world, you tell it all the things you
know - e.g. audio/video codecs.
Maybe the smallest possible Turing-complete (save for infinite memory) computers in our universe?
The problem is that they don't look so "simple"... But maybe that's because of our biased macro-view human perspective.
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
Hi Tom and Laurent,
I agree with Tom, their solution is not quite satisfying. I believe Marcus is well aware of this, and I personally don't think Marcus accepts their argument as well, as whenever he mentions (has to mention) this issue in his papers, he uses "universally agreed-upon enumeration" to "get rid of" arbitrariness.
As far as I know, there isn't a completely satisfying solution yet (I don't believe any of my colleague knows one). Nevertheless, the arbitrariness in choosing universal Turing machine only matters when we examine the complexity of a particular object, which we seldom do.
Hello Guys,
I just finished reading Shane’s paper on the negative results on making Solomonoff induction practical, and a question pop out:
What is the precise linkage between the theory of Solomonoff induction and the reality?
Take probability theory for example: The only linkage between the theory of probability and the reality is ‘the fundamental interpretative hypothesis’, which states ‘that events with zero or low probability are unlikely to occur’ (Shafer and Vovk, ‘Probability and finance: It’s only a game’, pp.5) This makes clear that conclusions drawn from probabilistic inference about the reality make sense only if we accept this hypothesis, and as a indirect consequence, provides theoretical basis for the generalization results in the statistical learning context. (E.g., A is better than B on a set of i.i.d. examples => A is likely to be better than B on all cases otherwise something with small probability will happen.)
When looking at the theory of Solomonoff induction (or the RL extension AIXI), I found such clearly stated linkage is somehow missing. Shane’s negative results show that we cannot say much mathematically about the performance of a complex predictor. However, in the original paper, it is suggested that the creation of AGI would be more or less an experimental science. However, I also found this point problematic. One of the reason is that almost all experimental study requires the comparison between two agents A and B. And if A and B are complex algorithmic objects, and if we adopt Shane’s definition that a predictor is universal if it ‘eventually’ correctly predict a sequence, then there is practically no way to draw any conclusion by looking at the performance of the agents on finite data. As a result, to make an experimental study useful, we have to answer precisely the following question:
What is the fundamental hypothesis we must accept to say that ‘Predictor A is better than B on the given set of finite strings => A is likely to be better than B on all sequences’?
In other word, we must state very clearly under what hypothesis should we belief that the experimental result will generalize, before we start to do any experiment. I don’t have an answer to this problem, yet I think the answer is necessary if we want to make the universal theory practically relevant.
Regards,
Sun Yi
--
Hello Guys,
I just finished reading Shane’s paper on the negative results on making Solomonoff induction practical,
and a question pop out:
What is the precise linkage between the theory of Solomonoff induction and the reality?
Take probability theory for example: The only linkage between the theory of probability and the reality is ‘the fundamental interpretative hypothesis’, which states ‘that events with zero or low probability are unlikely to occur’ (Shafer and Vovk, ‘Probability and finance: It’s only a game’, pp.5) This makes clear that conclusions drawn from probabilistic inference about the reality make sense only if we accept this hypothesis, and as a indirect consequence, provides theoretical basis for the generalization results in the statistical learning context. (E.g., A is better than B on a set of i.i.d. examples => A is likely to be better than B on all cases otherwise something with small probability will happen.)
When looking at the theory of Solomonoff induction (or the RL extension AIXI), I found such clearly stated linkage is somehow missing. Shane’s negative results show that we cannot say much mathematically about the performance of a complex predictor. However, in the original paper, it is suggested that the creation of AGI would be more or less an experimental science.
However, I also found this point problematic. One of the reason is that almost all experimental study requires the comparison between two agents A and B. And if A and B are complex algorithmic objects, and if we adopt Shane’s definition that a predictor is universal if it ‘eventually’ correctly predict a sequence, then there is practically no way to draw any conclusion by looking at the performance of the agents on finite data. As a result, to make an experimental study useful, we have to answer precisely the following question:
What is the fundamental hypothesis we must accept to say that ‘Predictor A is better than B on the given set of finite strings => A is likely to be better than B on all sequences’?
In other word, we must state very clearly under what hypothesis should we belief that the experimental result will generalize, before we start to do any experiment. I don’t have an answer to this problem, yet I think the answer is necessary if we want to make the universal theory practically relevant.
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
> It is also clear that computable agents will perform much poorer than AIXI in the real world (if that makes sense to consider AIXI in the real world).
I'll try to contradict it, and you can tell me if I'm wrong.
My understanding is that AIXI is provably optimal (given that we can agree on the Turing machine used to express plans, accept the specific formulation of the Solomonoff predictor as optimal, and agree on a horizon) across all possible RL problems. But the real world, by which I assume you mean the physical world, will not present a uniform sampling of all possible RL problems to an agent. In fact, it is quite possible that the real world will only present an infinitessimal subset of all possible RL problems to the agent. In this case, an agent may be able to far outperform AIXI on physical world RL tasks, yet still may be considered "general" in the sense that it can do anything that we ourselves are likely to do or need done.
Hello,
My name is SUN Yi, Yi is the first name: )
From my point of view:
- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )
If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.
- My problem is with the computable agent. Shane’s result shows that even if we make the assumption that the world is simple in the Kolmogorov sense, we still cannot guarantee the performance of the agent. Moreover, beyond certain point we cannot even mathematically reason about the computable agent.
- As for the comparison of two agents, I have the intuition that it is not decidable (no proof yet). Anyway, I wrote the following Lemma, basically restate Shane’s result. However, it shows that deciding if a sequence is predicted by a given predictor is very hard.
Regards,
Sun Yi
From: magic...@googlegroups.com [mailto:magic...@googlegroups.com] On Behalf Of Laurent
Sent: Tuesday, August 23, 2011 21:51
To: magic...@googlegroups.com
--
Laurent,
Thanks for your response. I've added a couple of notes below.____________________- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world
>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't relevant.
- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.
>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.
>>It would be great to be able to make such a comparison in practice.
- last but not least, I was considering *real-time* real-world agents, those which can only do a very limited number of computations per interaction step, whereas AIXI can do an infinite number of computations in the same time. For such real agents, learning can only be extremely slow, since this requires time.
>>Hardly a fair comparison! :)
So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.
Brandon
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
Brandon,One point below.On Wed, Aug 24, 2011 at 2:24 AM, Rohrer, Brandon R <brr...@sandia.gov> wrote:
Laurent,
Thanks for your response. I've added a couple of notes below.____________________- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world
>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't relevant.
- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.
>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them.(I should provide a reference for the equivalence, but I'm not sure where it is proven...)
My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them.(I should provide a reference for the equivalence, but I'm not sure where it is proven...)
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.
--
Hello,
The assumption that the environment is computable or sampled from a computable distribution is not verifiable by any agent inside that environment. In that sense accepting the computable assumption or refuting it are equally unfounded.
Of course this does not mean that we cannot design some environment for the agent that is totally under our control.
Regards,
Sun Yi
From: magic...@googlegroups.com [mailto:magic...@googlegroups.com] On Behalf Of Eray Ozkural
Sent: Wednesday, August 24, 2011 23:29
To: magic...@googlegroups.com
Subject: Re: [MAGIC-list] Link between AGI theory and reality
On Wed, Aug 24, 2011 at 12:11 PM, Sun Yi <y...@idsia.ch> wrote:
--
From my point of view:
- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )
If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.
- My problem is with the computable agent. Shane’s result shows that even if we make the assumption that the world is simple in the Kolmogorov sense, we still cannot guarantee the performance of the agent. Moreover, beyond certain point we cannot even mathematically reason about the computable agent.
- As for the comparison of two agents, I have the intuition that it is not decidable (no proof yet).
Anyway, I wrote the following Lemma, basically restate Shane’s result. However, it shows that deciding if a sequence is predicted by a given predictor is very hard.
>>It would be great to be able to make such a comparison in practice.So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.
In the book, the constant c is actually never "fixed". For two
complexities C_1 and C_2 to be equivalent it is sufficient that there
exist *some* constant c such that for all x |C_1(x) - C_2(x)| <= c.
Then, all universal Turing machines are indeed in the smallest
equivalence class.
I think what the authors really mean by "clean solution" is "justification".
Vaibhav
On 10/24/11, Wén Shào <90b5...@gmail.com> wrote:
> Hi Tom,
>
> Sorry to bring this old thread up. You said that, by the argument made in
> LV08, all universal Turing machines will end up in the smallest equivalence
> class. I'm not so sure about this, once you fixed a constant c, why
> necessarily do all universal Turing machines end up in the smallest
> equivalence class?
>
> Let's put whether this argument solves the arbitrariness in choosing
> reference machine aside, the mathematical solution i.e. the whole idea of
> equivalence class and the partial order on this set is well defined, isn't
> it? Moreover, to me it's not the case that all universal Turing machines
> will end up in the smallest equivalence class.
>
> Cheers,
>
> Wén
>
>
>
> On Mon, Aug 22, 2011 at 11:33 PM, Tom Everitt <tom4e...@gmail.com> wrote:
>
>> Hi again, I finally had time to read Li & Vitanyi a bit more closely.
>>
>> They start out promising:
>>
>> "The complexity C(x) is invariant only up to a constant depending on the
>> reference function *phi_0*. Thus, one may object, for *every* string
>> *x*there is an additively optimal recursive function
>> *psi_0* such that C_psi_0(x) = 0 [psi_0 assigns complexity 0 to x]. So how
>> can one claim that C(x) is an objective notion?"
>>
>> They then set about giving a "mathematically clean solution" by defining
>> some equivalence classes, claiming these equivalence classes had a single
>> smallest element. This is all well, but the problem is that according to
>> their definition of equivalence classes, all universal Turing machines end
>> up in the smallest equivalence class. And, clearly, for every string
>> *x*there is a universal Turing machine assigning complexity 0 to
>> *x*. (Just take any universal Turing machine, and redefine it so it prints
>> *x* on no input.)
This is what confused me, it looks like from the book even within an equivalence class the "constant" c can vary depending on the complexities you're looking at and there is no fixed c for an equivalence class. How does this help to justify invariance?
On the other hand, if we force a fixed c on equivalence relations, then we won't end up with all UTM being in one equivalence class and the partial ordering is still well defined. There might be some other problems with it, but it makes more sense than having multiple c's? Or not? Not quite sure, just some random thoughts.
Vaibhav
In practical work, the invariance theorem is mostly irrelevant, since the constants involved are quite high. A lot of problems have complexity beneath those c's, so what's the point?
Solomonoff argues that subjectivity is a desired feature of algorithmic probability,
so according to him, at least, searching for an objective measure, the ultimate language, is moot. Memory brings subjectivity, obviously, so we use actually very complex machines, not minimal machines, in practice.Philosophically, I think this has been solved partially inand more completely in my proposal in 2007:So, the idea is that, you can use our universe as the universal machine. That's the most low-level machine imaginable, so it's the most neutral model that is free of any bias whatsoever.
Theoretically, I think you could use a universal quantum computer:Which would really mean that Deutsch solved that problem in 1985.The caveat I made was that, a slight but important detail: ultimately it is the universal model of computation that the correct Grand Unified Theory is going to provide. So, it could be some string theory contraption, or a RUCA, whatever you like, as well. The point is that such machines necessarily do not contain any information about particular world-states, but contain only universal information.
On Mon, Oct 24, 2011 at 12:37 PM, Eray Ozkural <exama...@gmail.com> wrote:In practical work, the invariance theorem is mostly irrelevant, since the constants involved are quite high. A lot of problems have complexity beneath those c's, so what's the point?
The point is purely theoretical I suppose. It would feel nice with an objective measure of complexity. In fact, this seems to be the main advertising point of Kolmogorov Complexity: that it entails that complexity is a property of the object rather than the language/subject. But never trust a salesman, right? :)
(Of course, it still works for infinite sequences.)
Solomonoff argues that subjectivity is a desired feature of algorithmic probability,
How could subjectivity possibly be a desirable property?
so according to him, at least, searching for an objective measure, the ultimate language, is moot. Memory brings subjectivity, obviously, so we use actually very complex machines, not minimal machines, in practice.Philosophically, I think this has been solved partially inand more completely in my proposal in 2007:So, the idea is that, you can use our universe as the universal machine. That's the most low-level machine imaginable, so it's the most neutral model that is free of any bias whatsoever.
Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
Theoretically, I think you could use a universal quantum computer:Which would really mean that Deutsch solved that problem in 1985.The caveat I made was that, a slight but important detail: ultimately it is the universal model of computation that the correct Grand Unified Theory is going to provide. So, it could be some string theory contraption, or a RUCA, whatever you like, as well. The point is that such machines necessarily do not contain any information about particular world-states, but contain only universal information.
One of the most desirable features of an objective AIT would be that it would make induction objective - so once we would have found this GUT, no one could ever question it (given he had the same data). Now, since we can't find an objective basis of induction, we have to rely on a questionable GUT, which in turn means that once we've found the semi-objective GUT-reference machine, it will also be forever questionable. Are you with me?
So we will never have a solid foundation of science *crying*. (This may be to push my argument a bit far, I'm not sure. Let's hope.)
This is probably only philosophically annoying, but still. It serves to show why I find it incomprehensible to desire a subjective AIT.
On Mon, Oct 24, 2011 at 5:37 PM, Tom Everitt <tom4e...@gmail.com> wrote:How could subjectivity possibly be a desirable property?Well, it's a another way of saying that a well-educated man has a better perspective. :)
Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
It would because from a properly positivist viewpoint, "other possible universes with different physical law" are metaphysical fantasies only. If you try to answer the following question you can see that for yourself: in what sense would it not be objective?
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
On Mon, Oct 24, 2011 at 4:54 PM, Eray Ozkural <exama...@gmail.com> wrote:On Mon, Oct 24, 2011 at 5:37 PM, Tom Everitt <tom4e...@gmail.com> wrote:How could subjectivity possibly be a desirable property?Well, it's a another way of saying that a well-educated man has a better perspective. :)
Hehehe, I know, Solomonoff could definitely have used a proper logic education ;)
Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
It would because from a properly positivist viewpoint, "other possible universes with different physical law" are metaphysical fantasies only. If you try to answer the following question you can see that for yourself: in what sense would it not be objective?
Okay, it might appear like that from a positivist standpoint, but I'm far from convinced that the positivists are right - even though their views are appealing in many ways (I don't think I'm very up to date on positivism though, we basically just discussed Carnap/Vienna group in philosophy class).
Anyway, it seems problematic to disqualify metaphysical possibility. For example, the foundation of AIXI is "the cybernetic model", with some agent interacting with some "world". This world is essentially any metaphysically possible world, and we make no reference, have no need, of the contingencies of the natural sciences to do this. The result is a nice theory of intelligence with minimal requirements as to what world it is put in - and nice theoretical insights on intelligence.
To do something similar, only allowing nomological possible worlds, would be quite hard, no? Especially since we don't even know the laws of the universe; in contrast we do know what's metaphysically possible.
So that's why metaphysical possibility is relevant, and a universe-based prior is non-objective.
I'm sure you have a bunch of counter arguments. Bring them on :)
On Mon, Oct 24, 2011 at 8:22 PM, Tom Everitt <tom4e...@gmail.com> wrote:
Well, it's a another way of saying that a well-educated man has a better perspective. :)
Hehehe, I know, Solomonoff could definitely have used a proper logic education ;)
That's not correct, because it's obvious from his writing that he does understand the incompleteness theorems in logic very well. Somebody who understands meta-mathematics is likely to understand predicate calculus, etc. very well. He also worked with Carnap, who knew a bit of logic :) What do you base this on? We positivists are all logic freaks :)
However, note that the extension does not really increase the predictive accuracy or the applicability of the method in any way. In particular, it does not create a set of more interesting problems of AI, since RL-problems are already among the kinds of problems that an AI system must be able to solve. It's just another kind of optimization problem. Although, obviously, it's a *general* optimization problem, and it is AI-complete, since we can cast any computational problem(P and NP problems etc.) and prediction/probability problems as RL-problems. But you don't have to. They can still be solved with a general problem solver that uses Sol. induction. So, I don't think that agent models are special in any way. They are not the foundation of intelligence. They are, as you say, part of cybernetic animal models, i.e. animats, that *use* intelligence.They are applications. It's artificial life, rather than intelligence. Intelligence is prediction. Nothing less, nothing more. Of course, I think behaviorism is a philosophical farce, that's why I badly want to avoid behaviorist sounding definitions. But that just makes my conviction stronger :)
In the final analysis, Solomonoff induction works because it is a *complete* formalization of the scientific process, and has been shown to apply to any problem in the observable universe. It is based on a *correct philosophy of science*. That's what's important. What else could be significant here? If this is based on scientific process, talk of metaphysics is completely irrelevant.
On Mon, Oct 24, 2011 at 8:24 PM, Eray Ozkural <exama...@gmail.com> wrote:However, note that the extension does not really increase the predictive accuracy or the applicability of the method in any way. In particular, it does not create a set of more interesting problems of AI, since RL-problems are already among the kinds of problems that an AI system must be able to solve. It's just another kind of optimization problem. Although, obviously, it's a *general* optimization problem, and it is AI-complete, since we can cast any computational problem(P and NP problems etc.) and prediction/probability problems as RL-problems. But you don't have to. They can still be solved with a general problem solver that uses Sol. induction. So, I don't think that agent models are special in any way. They are not the foundation of intelligence. They are, as you say, part of cybernetic animal models, i.e. animats, that *use* intelligence.They are applications. It's artificial life, rather than intelligence. Intelligence is prediction. Nothing less, nothing more. Of course, I think behaviorism is a philosophical farce, that's why I badly want to avoid behaviorist sounding definitions. But that just makes my conviction stronger :)
Okay, I get the point about artificial life contra intelligence. But my argument works for Sol. induction/prediction as well.
In order to know that the prediction works in this world - before knowing anything about this world - we need to figure out exactly what worlds are possible. And this is essentially all the metaphysically possible worlds!
Note though that with worlds I mean programs, and in particular their output. So the world with ghosts is essentially the same world as this world - they have the same output. (Only slightly more unlikely.)
From Wikipedia:
"Positivism is a philosophical approach.... [in which] ....sense experiences and their logical and mathematical treatment are the exclusive source of all worthwhile information."
Would this mean that I cannot a priori deduce what worlds I could possibly encounter? Strictly speaking, that would be doing math that's not the treatment of some sense datum. But that must be permissible, no?
In the final analysis, Solomonoff induction works because it is a *complete* formalization of the scientific process, and has been shown to apply to any problem in the observable universe. It is based on a *correct philosophy of science*. That's what's important. What else could be significant here? If this is based on scientific process, talk of metaphysics is completely irrelevant.
I agree that's pretty grand - it's a pretty good summation of why I like Sol. induction. Even so, I would have wished that the ultimate theory of science would have dealt with the problem I mentioned to Abram in my previous mail:
The interpretation of data should not be completely subjective. Consider this:
We are in a world, and have 100 times made output 1 and received positive feedback, and a 100 times made output 0 and received negative feedback.
Now, since we can't say which prior is the best, we can not say that in order to maximize reward for the next output, we should choose 1.
This is highly unintuitive. Perhaps it really is this way, but it's not very satisfying (to my uneducated mind, at least :) ).
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
Eray,Tom is not saying anything about induction not working, or Laplace's rule failing. He is only pointing out that we need to choose a prior to determine how much evidence it takes to converge to correct predictions, and there will always be a different universal prior that we could have chosen which would tell us "not yet!" and find some different alternative more probable.
In other words, there may always be some rogue scientists who have examined the same evidence as everyone else but have a sufficiently different taste when it comes to "elegance" of a theory that they prefer a different explanation, and make different predictions. Furthermore, these rogue scientists may be correct. (The best we can do is judge their theories with our own prior, but we have no claim to "the best prior".
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
On Tue, Oct 25, 2011 at 7:45 PM, Abram Demski <abram...@gmail.com> wrote:Eray,Tom is not saying anything about induction not working, or Laplace's rule failing. He is only pointing out that we need to choose a prior to determine how much evidence it takes to converge to correct predictions, and there will always be a different universal prior that we could have chosen which would tell us "not yet!" and find some different alternative more probable.Then, that's a moot point, because no universal prior tells you how close you are to the real solution. They can never do that.
Eray,
I agree, it's not going to be much of a concern in practice. Indeed, it seems like we would have to know the data ahead of time in order to specifically go and construct priors that would significantly disagree after more than 1,000 observed bits. The concern is purely theoretical.The question arises, what would we even do with such a "best UTM" if we had it? Presumably it would have to have some fairly special powers, enough for us to prefer it over other UTMs. It is very similar to trying to design the best programming language. In fact, it's the same problem.
Perhaps there is a very real sense in which some languages are more powerful; ie, it's easier to write an interpreter for any other language you like. (Lisp would do well by that metric, for example.) This pushes down the constant penalties to get to other languages. However, it *must* push something up when it pushes something down, because we only have finite probability mass to shove around. So, taking this to its conclusion, we would get a programming language that can easily write programming languages, but in which it is very difficult to write anything else...
Well, you know what I will say, there should be a minimal energy consuming universal quantum computer, which you can use for these purposes. If that's not good enough, I don't know what is :)
I think that should solve the problem of subjectivity once and for all.And if you use that, you will, in fact, have 1. Don't worry about that :)
Sure, give me till tomorrow to formulate it!
To backtrack a bit, I believe our differences stem from fundamentally different core philosophies. Following is an attempt to spell them both out:
Either one can view mathematics/logic as something a priori given, by which we try to figure out the world we happen to live. In this view the laws of the world are mere accidents; there is nothing objective about them. (I believe this is the view the cybernetic model tries to capture.)
The alternative is to take the world as the base, we learn about it through induction, and learn to generalize more and more - this process leads to mathematics. This model fits well with how human knowledge has evolved through the ages, I suppose.
The latter view, together with the assumption that mathematics has no "transcendent existence" (is confined to our universe), entails a position much like yours.
(Is this a fair description? I'm only grateful for improvements on these.)
I think I subscribe to the first view, not because I can disprove the positivist position, but because the first (I don't know what to call it) feels slightly more elegant. So perhaps this is a genuine conflict of priors :) (I don't think so though, we would probably converge would we discuss it long enough.)
No, cybernetic view is just applied control theory, in construction of artificial animals and the like. It's basically agent models. It does not make a philosophical commitment in itself. What do you mean by cybernetic model?
Nowadays, I think one of the most popular philosophy among hard-core scientists is instrumentalism:As it's smart enough not to make dubious metaphysical posits. Just to give a flavor of modern philosophy of science, as opposed to Plato's false theory.
It depends on whether you think mathematics has a privilege. But why should it? From a proper scientific point of view, as an anthropologist would say: at some point we invented numbers to make better calculations, this linguistic and mental invention increased our intelligence. Do you really disagree with the scientific fact that mathematics itself evolved in human culture?
Now, the strangest thing would be that, somehow, mathematics did transcend all physical reality. Which is an extraordinary claim, and requires extraordinary evidence. However, it would be very difficult to confirm or deny, right? That is to say, such a statement does not seem scientific at all, because it would be impossible to falsify.
--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en
In my view, you're only trading ontological simplicity for another kind of complexity, namely what mathematics really is. When studying mathematics I find it convenient to think of it as uncovering properties of (existing!) structures. Thinking about it in an instrumentalistic way, for instance, would be sufficiently more complicated to make up for the benefit of a simpler ontology. (I have faint hope of convincing you of this :) but I'm happy enough if I can convince you that this is a "consistent world-view." :) )
Eray,Solomonoff induction relies on falsifiability, but Bayesian learning theory is more general. I feel falsifiability is a bit too narrow, and has been elevated to a high place in science by accident (partly through the prevalence of "statistical significance" and testing against the "null hypothesis"). In general, scientific hypotheses can be either falsifiable or verifiable or both. (It's true that universally quantified statements, which are particularly important, tend to be only falsifiable.) It's also inevitable that some meaningful questions are neither falsifiable nor verifiable, since we don't have access to the entire universe at will.
Hi all,
For the record, my first email was:
Up to now, here are my main concerns with AIXI:Other comments below:
- Reference Turing machine
- There is currently no best choice, but it seems intuitive that small TMs are better than any random TM
- Reinforcement Learning
- RL as a utility function is very convenient for simplicity, but prone to self-delusion. Knowledge-seeking is better but it prefers true random number generators (not generated by itself). I'm currently thinking about a utility function that is like knowledge-seeking but, roughly, does not like things that are too complex, or more precisely, it doesn't like things that are predictably random.
- Incomputability
- Time dimension not taken into account. That is the current most important problem. If we can solve that in an optimal way, we will have real AGI. I'm also working on things in this corner, but right now I'd prefer to keep my ideas for myself :)
- Things I don't understand yet...
On Wed, Aug 10, 2011 at 17:22, Tom Everitt <tom4e...@gmail.com> wrote:Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.
Regarding limitations of the AIXI, I'd like to second Laurent on the point that the completely free choice of reference machine is a bit unsatisfying.
If I haven't missed anything, this means, among other things, that if we have two strings:
a = 111 b = 100110101010010111010101010101 001111010101
we can not objectively say that a is simpler than b, since according to some obscure languages (reference machines) b will actually be simpler than a.
Exactly.
But intuitively, a being simpler than b makes a lot more sense. Why?
In most programming languages, a would be far simpler than b, although both may turn out to be coded like "print 'aaa'"...
Now consider:
a = 111
b = 10101010
Which is simpler?
The question is more difficult, but it still seem plausible that a is simpler.
Even harder :
a = 111111...
b = 101010...
a still looks simpler, but if I had to bet my arm on it, I wouldn't risk it for less than ten million euros (not sure I would even risk it, my arm is quite useful).
As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?
Yes absolutely.
*If* we choose a "reference class of models" (e.g., Turing machines), then we can choose the smallest reference machine.
I thought about another option though, just a while back:
Instead (but still considering the TMs class), why not choose the reference UTM that orders TMs in the exact same order as the "natural" order, which is to grow TMs depending on the number of states, of transitions, and some lexicographical order?
(my first problem then was that there was too many TMs of same complexity)
Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.
Also, Hutter has some discussions on these matters in his 2005 book and in :
"A Philosophical Treatise of Universal Induction"
http://www.hutter1.net/official/bib.htm#uiphil
On last resort, we can set for a consensus on a "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.
Laurent