Re: [MAGIC-list] Choice of the reference model [was: AIXI limitations]

97 views
Skip to first unread message

Laurent

unread,
Aug 10, 2011, 11:48:33 AM8/10/11
to magic...@googlegroups.com
Hi all,

For the record, my first email was:
Up to now, here are my main concerns with AIXI:
  • Reference Turing machine
    • There is currently no best choice, but it seems intuitive that small TMs are better than any random TM
  • Reinforcement Learning
    •  RL as a utility function is very convenient for simplicity, but prone to self-delusion. Knowledge-seeking is better but it prefers true random number generators (not generated by itself). I'm currently thinking about a utility function that is like knowledge-seeking but, roughly, does not like things that are too complex, or more precisely, it doesn't like things that are predictably random.
  • Incomputability
    • Time dimension not taken into account. That is the current most important problem. If we can solve that in an optimal way, we will have real AGI. I'm also working on things in this corner, but right now I'd prefer to keep my ideas for myself :)
  • Things I don't understand yet...
Other comments below:

On Wed, Aug 10, 2011 at 17:22, Tom Everitt <tom4e...@gmail.com> wrote:

Regarding limitations of the AIXI, I'd like to second Laurent on the point that the completely free choice of reference machine is a bit unsatisfying.

If I haven't missed anything, this means, among other things, that if we have two strings:

a = 111 b = 100110101010010111010101010101 001111010101

we can not objectively say that a is simpler than b, since according to some obscure languages (reference machines) b will actually be simpler than a.


Exactly.
But intuitively, a being simpler than b makes a lot more sense. Why?
In most programming languages, a would be far simpler than b, although both may turn out to be coded like "print 'aaa'"...

Now consider:
a = 111
b = 10101010
 
Which is simpler?
The question is more difficult, but it still seem plausible that a is simpler.
Even harder :
a = 111111...
b = 101010...

a still looks simpler, but if I had to bet my arm on it, I wouldn't risk it for less than ten million euros (not sure I would even risk it, my arm is quite useful).

As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?

Yes absolutely.
*If* we choose a "reference class of models" (e.g., Turing machines), then we can choose the smallest reference machine.

I thought about another option though, just a while back:
Instead (but still considering the TMs class), why not choose the reference UTM that orders TMs in the exact same order as the "natural" order, which is to grow TMs depending on the number of states, of transitions, and some lexicographical order?
(my first problem then was that there was too many TMs of same complexity)


Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
 

This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.

Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.

Also, Hutter has some discussions on these matters in his 2005 book and in :
"A Philosophical Treatise of Universal Induction"
http://www.hutter1.net/official/bib.htm#uiphil

On last resort, we can set for a consensus on a  "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.

Laurent


Tom Everitt

unread,
Aug 10, 2011, 1:37:14 PM8/10/11
to magic...@googlegroups.com
Hi, thanks a lot for your input!


On Wed, Aug 10, 2011 at 5:48 PM, Laurent <laurent...@gmail.com> wrote:

As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?

Yes absolutely.
*If* we choose a "reference class of models" (e.g., Turing machines), then we can choose the smallest reference machine.

I thought about another option though, just a while back:
Instead (but still considering the TMs class), why not choose the reference UTM that orders TMs in the exact same order as the "natural" order, which is to grow TMs depending on the number of states, of transitions, and some lexicographical order?
(my first problem then was that there was too many TMs of same complexity)


Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?

This is good news. I hadn't actually realized that the structure of the TM is objective, and that some structures are arguably simpler than other. The smallest UTM found seems to have 22 states, but it doesn't seem to be proven minimal. http://en.wikipedia.org/wiki/Universal_Turing_machine#Smallest_machines

 
 

This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.

Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.


This is probably the article you mean http://arxiv.org/abs/cs/0608095. It sounds interesting, I will definitely read it. Unfortunately it seems that they conclude it is not really possible to find a simplest, universal computer:

"Moreover, we show that the reason for failure has a clear and interesting physical interpretation, suggesting that every other conceivable attempt to get rid of those additive constants must fail in principle, too."

Also, Hutter has some discussions on these matters in his 2005 book and in :
"A Philosophical Treatise of Universal Induction"
http://www.hutter1.net/official/bib.htm#uiphil

Looks interesting, too.
 

On last resort, we can set for a consensus on a  "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.


Yes, as I see it now, there are two "possible" paths to finding the ultimate reference machine. One is to try to order the UTM's "mathematically", but perhaps in a different way than Muller tried. The other, more philosophical approach, would be to find some structure for "turing-complete computing structures" in general, and apply a similar argument to the one you gave above on Turing machines.

But I'm definitely going to read Muller's argument about that this hole question is "hopeless", if he's convincing I guess we'll have to resort to that intuitive best choice of yours.


Tom

Laurent

unread,
Aug 10, 2011, 2:03:36 PM8/10/11
to magic...@googlegroups.com
Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?

This is good news. I hadn't actually realized that the structure of the TM is objective, and that some structures are arguably simpler than other. The smallest UTM found seems to have 22 states, but it doesn't seem to be proven minimal. http://en.wikipedia.org/wiki/Universal_Turing_machine#Smallest_machines

Wolfram's 2,3 TM is even smaller, but the computation model is a bit limited, it seems.

This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.

Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.


This is probably the article you mean http://arxiv.org/abs/cs/0608095. It sounds interesting, I will definitely read it.

This is exactly the one, I'm impressed you found it!
 
Unfortunately it seems that they conclude it is not really possible to find a simplest, universal computer:

"Moreover, we show that the reason for failure has a clear and interesting physical interpretation, suggesting that every other conceivable attempt to get rid of those additive constants must fail in principle, too."

I would not be so categorical myself, but who knows.
 

On last resort, we can set for a consensus on a  "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.


Yes, as I see it now, there are two "possible" paths to finding the ultimate reference machine. One is to try to order the UTM's "mathematically", but perhaps in a different way than Muller tried. The other, more philosophical approach, would be to find some structure for "turing-complete computing structures" in general, and apply a similar argument to the one you gave above on Turing machines.

Yes. Actually, I'd like very much a computation model that has many symmetries. I'd call that the "spherical model of computation". Now that we have the name, we just need to define the model, that should be easy ;)

Laurent

Tom Everitt

unread,
Aug 10, 2011, 2:27:16 PM8/10/11
to magic...@googlegroups.com


Hehehe, I hope you're right :) Seriously though, I totally agree about that the philosophical solution would be the nicest by far. The ultimate insight on what computation is... I wonder if it's possible. I'll let you know if I get anywhere near an idea on what that would look like, and I don't think I need to tell you that I'm all ears if you come up with something :)

Laurent

unread,
Aug 10, 2011, 2:37:48 PM8/10/11
to magic...@googlegroups.com
Hehehe, I hope you're right :) Seriously though, I totally agree about that the philosophical solution would be the nicest by far. The ultimate insight on what computation is... I wonder if it's possible.

Maybe the smallest possible Turing-complete (save for infinite memory) computers in our universe?
The problem is that they don't look so "simple"...  But maybe that's because of our biased macro-view human perspective.

Laurent

Abram Demski

unread,
Aug 10, 2011, 2:41:16 PM8/10/11
to magic...@googlegroups.com
FYI, Tim Tyler is also interested in this question.

I think we can say that, in principle, the best UTM would be the one
which best predicts the environment of the agent. In other words, the
UTM with the most built-in knowledge. The update process is a descent
towards minimal error; in other words, inductive learning can be seen
as a search for a better prior. The posterior of a Bayesian update can
always be treated as a new prior for the next Bayesian update.

The update to a universal distribution remains universal, so we can
even regard the result as a new universal Turing machine, although it
is not explicitly represented as such.

So, I strongly sympathize with the idea that there's no single best
prior. However, that doesn't mean we can't find nice desirable
properties in the directions you guys are discussing. The equivalence
class "Turing Complete" is a very weak one, which doesn't require very
many properties to be preserved. Different models of computation can
change the complexity classes of specific problems, for example.

One property I mentioned to Tom is the ratio of always-meaningful
programs (total functions) to sometimes-meaningless ones (partial
functions). IE, we ask how easy it is to get into an infinite loop
which produces no output. This number has been called "Omega".

It's not possible to have a language which is always-meaningful while
being Turing complete, so we can work at this from "both
directions"... augmenting always-meaningful languages to increase
expressiveness, and partially restricting Turing-complete languages to
be more meaningful. However, there is no perfect middle point-- we can
only get closer in each direction.

--Abram

--
Abram Demski
http://lo-tho.blogspot.com/
http://groups.google.com/group/one-logic

Laurent

unread,
Aug 10, 2011, 5:08:49 PM8/10/11
to magic...@googlegroups.com
On Wed, Aug 10, 2011 at 20:41, Abram Demski <abram...@gmail.com> wrote:
FYI, Tim Tyler is also interested in this question.

He's just been invited.
 

I think we can say that, in principle, the best UTM would be the one
which best predicts the environment of the agent. In other words, the
UTM with the most built-in knowledge. The update process is a descent
towards minimal error; in other words, inductive learning can be seen
as a search for a better prior. The posterior of a Bayesian update can
always be treated as a new prior for the next Bayesian update.

The update to a universal distribution remains universal, so we can
even regard the result as a new universal Turing machine, although it
is not explicitly represented as such.

You're right that if we go down the road of quantum physics, we should simply put as much knowledge as possible, to tailor the agent as much as possible toward the real-world.

Let's assume for a moment that we don't have knowledge about the real-world.
The question of the ultimate prior is then: Is there a UTM that is better than any other?
I.e.: Is there a universal notion of simplicity?

This looks somewhat equivalent to: "Is the definition of Turing machines simple because they are tailored to our own world, or is there an elegant mathematical truth behind that?"

Laurent

Tim Tyler

unread,
Aug 10, 2011, 6:14:54 PM8/10/11
to MAGIC
Hi, yes - I do have a page about the issue: http://matchingpennies.com/the_one_true_razor/

I also think it is not a huge deal. The "just use FORTRAN-77" is
reasonable. If you want to give your
machine a head start in the real world, you tell it all the things you
know - e.g. audio/video codecs.
--
__________
|im |yler http://timtyler.org/ t...@tt1lock.org Remove lock to
reply.

Tom Everitt

unread,
Aug 11, 2011, 11:19:28 AM8/11/11
to magic...@googlegroups.com

On Thu, Aug 11, 2011 at 12:14 AM, Tim Tyler <t...@tt1.org> wrote:
Hi, yes - I do have a page about the issue: http://matchingpennies.com/the_one_true_razor/

I also think it is not a huge deal.  The "just use FORTRAN-77" is
reasonable.  If you want to give your
machine a head start in the real world, you tell it all the things you
know - e.g. audio/video codecs.


So a question to you "no one true razor/prior" proponents:

If we have

a = 101,
b = 101101001000001010100101111111111010

do you not share the intuition that a is, in some sense, objectively simpler than b?


Another thing is that priors does matter, since your first prior will always affect your later posteriors... And to choose it so it fits the world well is not much of a solution either, since we want a general solution, no? But I do agree with Abrams point on that it makes sense to look for relatively "meaningful" languages.


On Wed, Aug 10, 2011 at 8:37 PM, Laurent <laurent...@gmail.com> wrote:

Maybe the smallest possible Turing-complete (save for infinite memory) computers in our universe?
The problem is that they don't look so "simple"...  But maybe that's because of our biased macro-view human perspective.


I don't want to say you're wrong, but I guess it would take a pretty argument on why to prefer them to anything else. As Abram points out, the class of turing-complete structures is quite diverse, and even though we can understand any member of it as a set of states and rules (like a Turing-machine), I guess we can also understand any Turing-machine through any of these other structures. But hopefully either simplicity is preserved between structures, or there is one "true" structure, the "spherical model of computation". I like the name :)

Tim Tyler

unread,
Aug 11, 2011, 4:08:58 PM8/11/11
to MAGIC
On Aug 11, 4:19 pm, Tom Everitt <tom4ever...@gmail.com> wrote:

> So a question to you "no one true razor/prior" proponents:
>
> If we have
>
> a = 101,
> b = 101101001000001010100101111111111010
>
> do you not share the intuition that a is, in some sense,
> objectively simpler than b?

Sure, but if you consider:
a = 01010010101010010011010110101101011
...and...
b = 101101001000001010100101111111111010
...then that intuition is not so clear.

Tom Everitt

unread,
Aug 11, 2011, 4:45:07 PM8/11/11
to magic...@googlegroups.com
Sure, but my point is:

If the choice of reference machine can be made completely arbitrarily, then in some choices a will be simpler than b, and in other choices b will be simpler than a. No matter what a and b happens to be (as long as they are finite strings).

So to say that there is no "right" choice of reference machine (or class of reference machines), must therefore imply that a=1 is not (can not be!) objectively simpler than b=10101100111111100100100000110101.

Do you see my point?

Wén Shào

unread,
Aug 11, 2011, 7:04:53 PM8/11/11
to magic...@googlegroups.com
I agree with Tom Everitt and I had the same problem before, namely, for every string(finite length) $x$ there is an additively optimal recursive function $f$ such that the Kolmogorov complexity with respect to that particular $f$ is 0. I think this problem stems from the arbitrariness in enumeration of Turing machines, the example given by Tom best illustrates this problem. Consider an arbitrary enumeration of Turing machines, a Turing machine TMb that has a very complex string (like b= 10101100111111100100100000110101) "built-in" which happens to appear in the top several, then the complexity of string b with respect to that enumeration, though intuitively large, is low. This arbitrariness doesn't always matter, for example when we consider randomness of an infinite sequence, the additive constant can never be so overwhelming. Nevertheless, when we consider finite strings, it's indeed an annoying issue. There are basically two solutions to this problem as far as I know, 

(1) To use some predetermined and universally agreed-upon reference machines, that is to use natural machines. 
(2) A more mathematically formal solution is given in LV08 ~p.111. 

Just my thought. 
--

Wén

Tom Everitt

unread,
Aug 11, 2011, 7:36:41 PM8/11/11
to magic...@googlegroups.com
Thank you Wén Shào, that was most helpful! Took me a little while to realize LV08 referred to Li & Vitanyi - An Introduction to Kolmogorov Complexity, but then I found it. Now I just need some time to digest the argument, but it looks really promising.

Tom Everitt

unread,
Aug 22, 2011, 9:33:40 AM8/22/11
to magic...@googlegroups.com
Hi again, I finally had time to read Li & Vitanyi a bit more closely.

They start out promising:

"The complexity C(x) is invariant only up to a constant depending on the reference function phi_0. Thus, one may object, for every string x there is an additively optimal recursive function psi_0 such that C_psi_0(x) = 0 [psi_0 assigns complexity 0 to x]. So how can one claim that C(x) is an objective notion?"

They then set about giving a "mathematically clean solution" by defining some equivalence classes, claiming these equivalence classes had a single smallest element. This is all well, but the problem is that according to their definition of equivalence classes, all universal Turing machines end up in the smallest equivalence class. And, clearly, for every string x there is a universal Turing machine assigning complexity 0 to x. (Just take any universal Turing machine, and redefine it so it prints x on no input.)

So in my opinion their argument is not making C(x) any more objective.

Any thoughts?


PS. If anyone wants to see the argument for himself, but don't have the book at hand, I can just scan the page (it's only one page so it's no trouble). Just let me know.

Wén Shào

unread,
Aug 23, 2011, 12:07:43 AM8/23/11
to magic...@googlegroups.com
Hi Tom, 

I don't quite see why all universal Turing machines have to necessarily end up in the smallest equivalence class. An apt example would be: given a universal Turing machine T, I can effectively construct another UTM T' such that the complexity with respective to T' is uniformly larger than the complexity with respective to T by some large number C. Then T' won't be in the equivalence class of T if I choose c sufficiently small. 

Just my thought, 

Cheers, 

Wén



--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Laurent

unread,
Aug 23, 2011, 3:09:25 AM8/23/11
to magic...@googlegroups.com
Wén, do you feel the proposed solution is appealing, or satisfying? (and why, if possible)
It's strange this solution is rarely quoted, though I suppose Marcus (for example) is well aware of it?
I'll need to have a closer look sometime soon.

Laurent

Wén Shào

unread,
Aug 23, 2011, 5:32:09 AM8/23/11
to magic...@googlegroups.com
Hi Tom and Laurent, 

Please ignore my comment made before, obviously it didn't' make any sense. Should think more carefully before I speak...:P

I will spend some time looking at their solution more closely, and let you know what I feel then. 

Cheers, 

Wén

Wén Shào

unread,
Aug 23, 2011, 7:44:18 AM8/23/11
to magic...@googlegroups.com
Hi Tom and Laurent, 

I agree with Tom, their solution is not quite satisfying. I believe Marcus is well aware of this, and I personally don't think Marcus accepts their argument as well, as whenever he mentions (has to mention) this issue in his papers, he uses "universally agreed-upon enumeration" to "get rid of" arbitrariness. 

As far as I know, there isn't a completely satisfying solution yet (I don't believe any of my colleague knows one). Nevertheless, the arbitrariness in choosing universal Turing machine only matters when we examine the complexity of a particular object, which we seldom do. 

Just my thought, 

Wén

Laurent

unread,
Aug 23, 2011, 8:13:18 AM8/23/11
to magic...@googlegroups.com
On Tue, Aug 23, 2011 at 13:44, Wén Shào <90b5...@gmail.com> wrote:
Hi Tom and Laurent, 

I agree with Tom, their solution is not quite satisfying. I believe Marcus is well aware of this, and I personally don't think Marcus accepts their argument as well, as whenever he mentions (has to mention) this issue in his papers, he uses "universally agreed-upon enumeration" to "get rid of" arbitrariness. 

Thanks Wén, good to know.
 

As far as I know, there isn't a completely satisfying solution yet (I don't believe any of my colleague knows one). Nevertheless, the arbitrariness in choosing universal Turing machine only matters when we examine the complexity of a particular object, which we seldom do. 

Hmm, I'm not so sure.
The reference UTM is also of importance when considering small strings, which we often do.
One could however argue that our experience in the world (genetic and cultural) has made "our reference machine" less arbitrary, and that it would go similarly for an AGI with more than 20 years of age.

Laurent

Sun Yi

unread,
Aug 23, 2011, 2:50:56 PM8/23/11
to magic...@googlegroups.com

Hello Guys,

 

I just finished reading Shane’s paper on the negative results on making Solomonoff induction practical, and a question pop out:

 

What is the precise linkage between the theory of Solomonoff induction and the reality?

 

Take probability theory for example: The only linkage between the theory of probability and the reality is ‘the fundamental interpretative hypothesis’, which states ‘that events with zero or low probability are unlikely to occur’ (Shafer and Vovk, ‘Probability and finance: It’s only a game’, pp.5) This makes clear that conclusions drawn from probabilistic inference about the reality make sense only if we accept this hypothesis, and as a indirect consequence, provides theoretical basis for the generalization results in the statistical learning context. (E.g., A is better than B on a set of i.i.d. examples => A is likely to be better than B on all cases otherwise something with small probability will happen.)

 

When looking at the theory of Solomonoff induction (or the RL extension AIXI), I found such clearly stated linkage is somehow missing. Shane’s negative results show that we cannot say much mathematically about the performance of a complex predictor. However, in the original paper, it is suggested that the creation of AGI would be more or less an experimental science. However, I also found this point problematic. One of the reason is that almost all experimental study requires the comparison between two agents A and B. And if A and B are complex algorithmic objects, and if we adopt Shane’s definition that a predictor is universal if it ‘eventually’ correctly predict a sequence, then there is practically no way to draw any conclusion by looking at the performance of the agents on finite data. As a result, to make an experimental study useful, we have to answer precisely the following question:

 

What is the fundamental hypothesis we must accept to say that ‘Predictor A is better than B on the given set of finite strings => A is likely to be better than B on all sequences’?

 

In other word, we must state very clearly under what hypothesis should we belief that the experimental result will generalize, before we start to do any experiment. I don’t have an answer to this problem, yet I think the answer is necessary if we want to make the universal theory practically relevant.

 

Regards,

 

Sun Yi

--

Laurent

unread,
Aug 23, 2011, 3:50:59 PM8/23/11
to magic...@googlegroups.com
Hi (btw, what's your first name, Yi or Sun?)

That's a very interesting question you raise.

On Tue, Aug 23, 2011 at 20:50, Sun Yi <y...@idsia.ch> wrote:

Hello Guys,

 

I just finished reading Shane’s paper on the negative results on making Solomonoff induction practical,


Btw, note the negative result holds for agents too, not only for prediction:
http://www.hutter1.net/official/bib.htm#asyoptag

(And I will discuss Shane's results in another thread.)
 

and a question pop out:

 

What is the precise linkage between the theory of Solomonoff induction and the reality?


You may also want to read "A philosophical treatise of Universal induction":
http://www.hutter1.net/official/bib.htm#uiphil

It does not make a direct, irrefutable link with the real world, but it shows how Solomonoff's Induction solves some long-standing philosophical problems about induction, like the Raven's paradox .
 

 

Take probability theory for example: The only linkage between the theory of probability and the reality is ‘the fundamental interpretative hypothesis’, which states ‘that events with zero or low probability are unlikely to occur’ (Shafer and Vovk, ‘Probability and finance: It’s only a game’, pp.5) This makes clear that conclusions drawn from probabilistic inference about the reality make sense only if we accept this hypothesis, and as a indirect consequence, provides theoretical basis for the generalization results in the statistical learning context. (E.g., A is better than B on a set of i.i.d. examples => A is likely to be better than B on all cases otherwise something with small probability will happen.)


If you go this far, you can also question the validity of mathematical axioms, and Leo Pape would probably love to discuss that with you ;)

However, regarding probability theory, I thought that (IIRC) Cox's axioms were pretty solid, i.e. don't they (him, Jaynes and others, see Jaynes 2003) show that probability theory is entailed properly (within some parameter set) from desirable principles, and if those principles are not used, we can get into trouble?
 

When looking at the theory of Solomonoff induction (or the RL extension AIXI), I found such clearly stated linkage is somehow missing. Shane’s negative results show that we cannot say much mathematically about the performance of a complex predictor. However, in the original paper, it is suggested that the creation of AGI would be more or less an experimental science.


(Shane, if you're reading, is that the reason why you went to neuroscience?)
Personally, I don't think there is a need to consider AGI only as an experimental science.
There are still many questions to answer. Maybe performance is not the right criterion, or maybe it is but we need a different framework were we have different properties.
It is also clear that computable agents will perform much poorer than AIXI in the real world (if that makes sense to consider AIXI in the real world).
Legg's results are a start, but I believe this is far from the end of the story. We need more results, in different situations, etc.
 

However, I also found this point problematic. One of the reason is that almost all experimental study requires the comparison between two agents A and B. And if A and B are complex algorithmic objects, and if we adopt Shane’s definition that a predictor is universal if it ‘eventually’ correctly predict a sequence, then there is practically no way to draw any conclusion by looking at the performance of the agents on finite data. As a result, to make an experimental study useful, we have to answer precisely the following question:

 

What is the fundamental hypothesis we must accept to say that ‘Predictor A is better than B on the given set of finite strings => A is likely to be better than B on all sequences’?


Why couldn't we compare agents theoretically?
Also, infinite sequences are of interest, even if we end up saying, for example, that some agent cannot learn any infinite sequence for some reason.

So an agent can  perform very poorly on some sequences/strings, yet perform well on some others.
A computable agent cannot perform well against some kind of adversarial environments? Well, maybe this kind of environments are too specific (maybe we can measure this specificity with the mutual information of the agent and the environment: if they are too much alike, the agent might perform poorly), but the agent can still perform very well in highly complex environments that are not adversarial.
 

 

In other word, we must state very clearly under what hypothesis should we belief that the experimental result will generalize, before we start to do any experiment. I don’t have an answer to this problem, yet I think the answer is necessary if we want to make the universal theory practically relevant.


I very much agree that we need to state clearly what hypothesis we should believe in.
Peter Sunehag is trying to axiomatize rational RL: http://www.hutter1.net/official/bib.htm#aixiaxiom
That's a nice start.

What would be marvelous is a set of hypothesis that entail the AGI theory (potentially based on Solomonoff Induction), along with proofs that outside these boundaries, some good properties cannot hold.
One such results is Hutter's "only agents based on bayes mixtures can be Pareto optimal":
Hutter's book 2005, and, I think, http://www.hutter1.net/ai/optisp.htm

Laurent

Rohrer, Brandon R

unread,
Aug 23, 2011, 5:15:44 PM8/23/11
to magic...@googlegroups.com
Hi Laurent,

I'm enjoying the discussion here. I want to draw attention to a statement you made in your last post:

It is also clear that computable agents will perform much poorer than AIXI in the real world (if that makes sense to consider AIXI in the real world).

I'll try to contradict it, and you can tell me if I'm wrong.

My understanding is that AIXI is provably optimal (given that we can agree on the Turing machine used to express plans, accept the specific formulation of the Solomonoff predictor as optimal, and agree on a horizon) across all possible RL problems. But the real world, by which I assume you mean the physical world, will not present a uniform sampling of all possible RL problems to an agent. In fact, it is quite possible that the real world will only present an infinitessimal subset of all possible RL problems to the agent. In this case, an agent may be able to far outperform AIXI on physical world RL tasks, yet still may be considered "general" in the sense that it can do anything that we ourselves are likely to do or need done.

Are there some holes in this that I'm not seeing?

Brandon   

Abram Demski

unread,
Aug 23, 2011, 8:45:55 PM8/23/11
to magic...@googlegroups.com
Rohrer,

An answer (partly joking, but true): that's just a matter of reference machine choice. We can choose a reference machine which generates RL problems in a way that is distributed very much like our world, and talk about how well *that* formulation of AIXI would do in comparison to a finite agent.

--Abram

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Abram Demski

unread,
Aug 23, 2011, 8:50:44 PM8/23/11
to magic...@googlegroups.com
Hi,

I think one answer to this is: elegant theories usually win. That's the fundamental connection between Solomonoff induction and real-world experience, and it is actually quite related to the "fundamental interpretive hypothesis" you mention for probability theory. Whereas that hypothesis is that low-probability events will rarely be encountered, this one is that elegant things will often be encountered.

--Abram

Laurent

unread,
Aug 24, 2011, 4:27:13 AM8/24/11
to magic...@googlegroups.com
Hi Brandon,

> It is also clear that computable agents will perform much poorer than AIXI in the real world (if that makes sense to consider AIXI in the real world).

I'll try to contradict it, and you can tell me if I'm wrong.

My understanding is that AIXI is provably optimal (given that we can agree on the Turing machine used to express plans, accept the specific formulation of the Solomonoff predictor as optimal, and agree on a horizon) across all possible RL problems. But the real world, by which I assume you mean the physical world, will not present a uniform sampling of all possible RL problems to an agent. In fact, it is quite possible that the real world will only present an infinitessimal subset of all possible RL problems to the agent. In this case, an agent may be able to far outperform AIXI on physical world RL tasks, yet still may be considered "general" in the sense that it can do anything that we ourselves are likely to do or need done.

Your argument has some appeal.
So, yes, in a sense, we might do better than AIXI in our real world, that's right.
If you consider one single task, like adding 32 bits numbers, then certainly we can do better than AIXI, since we might even be able to prove that we have the best algorithm. But that is limited to very simple tasks, and the world is much more complex.

And:

- Today, I tend to think that our world exposes us often to (some) simple "sub-environments" (all goes as expected, everything is predictable), and seldom to more complex sub-environments. This distribution reflects Occam's razor, except than instead of considering a diversity of environments, we consider one single environment composed of multiple sub-environments. Apparently, you seem to refer to that too, if I'm not mistaken.

- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.
So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world.

- AIXI does not need the world to be a composition of environments. It works great in any environment. For Solomonoff Induction, a computable agent can only save a finite amount of prediction mistakes in any environment.
(This is not completely true for AIXI, see http://www.springerlink.com/content/p2780778k054411x/ and http://www.hutter1.net/official/bib.htm#asyoptag but that problem may arise in computable agents as well, as it is a problem of exploration, not of exploitation)
I.e., even if you choose the environment, and the best agent for that environment, Solomonoff Induction will learn to predict like that agent in less errors than the size of the smallest program that computes the environment. (For the real world, "all we need" for the smallest program are the laws of physics and the initial state at the big-bang, for example. Much less is required in fact, as many things do not directly depend on distant stars).

- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.

- last but not least, I was considering *real-time* real-world agents, those which can only do a very limited number of computations per interaction step, whereas AIXI can do an infinite number of computations in the same time. For such real agents, learning can only be extremely slow, since this requires time.

So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.


Is this an accurate answer?


Laurent

Sun Yi

unread,
Aug 24, 2011, 5:11:26 AM8/24/11
to magic...@googlegroups.com

Hello,

 

My name is SUN Yi, Yi is the first name: )

 

From my point of view:

 

- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )

If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.

 

- My problem is with the computable agent. Shane’s result shows that even if we make the assumption that the world is simple in the Kolmogorov sense, we still cannot guarantee the performance of the agent. Moreover, beyond certain point we cannot even mathematically reason about the computable agent.

 

- As for the comparison of two agents, I have the intuition that it is not decidable (no proof yet). Anyway, I wrote the following Lemma, basically restate Shane’s result. However, it shows that deciding if a sequence is predicted by a given predictor is very hard.

 

Regards,

 

Sun Yi

 

 


Sent: Tuesday, August 23, 2011 21:51
To: magic...@googlegroups.com

--

image001.jpg

Rohrer, Brandon R

unread,
Aug 24, 2011, 5:24:43 AM8/24/11
to magic...@googlegroups.com
Laurent,

Thanks for your response. I've added a couple of notes below.
____________________
- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.
So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world

>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't  relevant.


- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.

>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.

- last but not least, I was considering *real-time* real-world agents, those which can only do a very limited number of computations per interaction step, whereas AIXI can do an infinite number of computations in the same time. For such real agents, learning can only be extremely slow, since this requires time.

>>Hardly a fair comparison! :)

So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.

>>It would be great to be able to make such a comparison in practice.

Brandon

Abram Demski

unread,
Aug 24, 2011, 2:30:31 PM8/24/11
to magic...@googlegroups.com
Brandon,

One point below.

On Wed, Aug 24, 2011 at 2:24 AM, Rohrer, Brandon R <brr...@sandia.gov> wrote:
Laurent,

Thanks for your response. I've added a couple of notes below.
____________________
- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.
So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world

>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't  relevant.


- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.

>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.

My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.

The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:

The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them. 

(I should provide a reference for the equivalence, but I'm not sure where it is proven...)
 

- last but not least, I was considering *real-time* real-world agents, those which can only do a very limited number of computations per interaction step, whereas AIXI can do an infinite number of computations in the same time. For such real agents, learning can only be extremely slow, since this requires time.

>>Hardly a fair comparison! :)

So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.

>>It would be great to be able to make such a comparison in practice.

Brandon

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Laurent

unread,
Aug 24, 2011, 2:49:16 PM8/24/11
to magic...@googlegroups.com
A quick addition:


On Wed, Aug 24, 2011 at 20:30, Abram Demski <abram...@gmail.com> wrote:
Brandon,

One point below.

On Wed, Aug 24, 2011 at 2:24 AM, Rohrer, Brandon R <brr...@sandia.gov> wrote:
Laurent,

Thanks for your response. I've added a couple of notes below.
____________________
- Our real world looks Turing-complete (save for some huuuge memory bound), since we can create computers. Thus virtually all environments can be expressed in this framework, expect the veeery complex ones maybe, but note that the prior of these environments is small.
So it's probably not infinitesimal.
Your argument might be better in a simpler environment than the real world

>>True. If you meant to include the computers we create when you cited the "real world" then my comment isn't  relevant.


- AIXI discards all environments that are not consistent with its experience. So it (probably) suffices for AIXI to throw one single die to "infer" many things about classical mechanics, and maybe quantum dynamics and Newton's laws of gravitation. With a few more throws it could generalize the laws very quickly. It does not need to observe the sand and the stars to infer that the world might be composed of particles (or that it leads to a good model). Learning is amazingly fast.
The knowledge it gains about the world grows so quickly that even if you add some additional initial knowledge to a computable agent, AIXI should catch up in no time.

>>You seem to have an intuition about how an AIXI would perform in certain situations. I don't have this, or I should say, my intuition is different. But since my intuition based on no empirical evidence, I don't give it much weight. My impression was that AIXI never discards an environment, but just considers them less likely when they are contradicted. The notion that a single observation could be used to definitively rule out certain possibilities seems strange to me. Why would that be so? Does Solomonoff Induction make no allowance for inaccuracies in observations? If so, that seems like another major barrier to implementing a variant of AIXI.

My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.

The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:

The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them. 

(I should provide a reference for the equivalence, but I'm not sure where it is proven...)


You can find that in Hutter 2005 at least, probably in Li&Vitanyi too.

Brandon, when I say "discard an environment", I'm talking about an environment as a program p, so that on input y_{1:t} (the history of the agent's actions from time 1 to t), p(y_{1:t}) produces the output x_{1:t} (observations of the agent).
If the output produced by this program is different from the true output as received by the agent, then this particular environment/program is no longer consistent and gets discarded.
However, it is easy to construct a new program p' based on p that remains consistent with the history. In general, though, p' is more complex than p.
This means that as the agent interacts with the environment, there is *always* an infinite variety of environments that are consistent with the history, and there are always environments that predict whatever you want (i.e. any future is *possible*). However, Occam's razor makes some futures less and less likely, making the program representing the true environment more and more likely.

Laurent


Tom Everitt

unread,
Aug 24, 2011, 3:35:57 PM8/24/11
to magic...@googlegroups.com
On Wed, Aug 24, 2011 at 8:30 PM, Abram Demski <abram...@gmail.com> wrote:

My guess is that this confusion comes from the existence of two different formulations of solomonoff induction.

The simpler formulation is a mixture model of all computable deterministic models. In this case, an individual model makes hard predictions about the future, and we can simply discard it if it turns out to be wrong. Bad models will initially be eliminated very very quickly.:

The mixture model can also be formulated as a combination of all computable probability distributions. In this case, models make soft predictions, so we are adjusting their relative probabilities rather than discarding them. 

(I should provide a reference for the equivalence, but I'm not sure where it is proven...)


Are they really equivalent as induction principles?

From Li & Vitanyi, p. 273, they are equivalent in the sense that:

"there is a constant c such that for every string x, log(1/m(x)) will differ at most c from C(x) "

So they do have a close connection. But does that mean they will perform equally well (bad) in probabilistic worlds?

Consider the world that outputs 1 with 80% probability. Since it is still pretty random, I imagine you will not manage much compression with a Turing machine, and since it is not possible to express probabilities with TM's, will it even predict that the next input will be 1 with higher probability then 0? Not sure.

The measure m on the other hand should soon converge on the (computable) distribution 80-20, and make adequate predictions.

At least this is the way I've always thought about it, but I have no proof this is right in any sense.

Abram Demski

unread,
Aug 24, 2011, 4:17:41 PM8/24/11
to magic...@googlegroups.com
Tom,

Yea, it at least seems like the probabilistic version will be much more practical, because the search process will be easier! The deterministic version will have to keep searching for new models all the time in order to get the same result. So, for a resource-limited approximation, it seems like a much better idea.

--Abram

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Eray Ozkural

unread,
Aug 24, 2011, 5:28:56 PM8/24/11
to magic...@googlegroups.com
On Wed, Aug 24, 2011 at 12:11 PM, Sun Yi <y...@idsia.ch> wrote:
- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )

If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.


Yet, this is a misunderstanding. There is no assumption that the world itself is a computation.

On the other hand, very rigorous mathematical results show that QM and GR are formulated precisely in computable mathematics (which is something else to say).

What Sol. induction assumes is that the probability distribution we're working with is computable, which is something else entirely (much weaker than already proven results mentioned above) and not religious at all.

Best,
 

--

Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy

Sun Yi

unread,
Aug 24, 2011, 6:07:27 PM8/24/11
to magic...@googlegroups.com

Hello,

 

The assumption that the environment is computable or sampled from a computable distribution is not verifiable by any agent inside that environment. In that sense accepting the computable assumption or refuting it are equally unfounded.

 

Of course this does not mean that we cannot design some environment for the agent that is totally under our control.

Regards,

 

Sun Yi

 

From: magic...@googlegroups.com [mailto:magic...@googlegroups.com] On Behalf Of Eray Ozkural
Sent: Wednesday, August 24, 2011 23:29
To: magic...@googlegroups.com
Subject: Re: [MAGIC-list] Link between AGI theory and reality

 

On Wed, Aug 24, 2011 at 12:11 PM, Sun Yi <y...@idsia.ch> wrote:

--

Abram Demski

unread,
Aug 24, 2011, 7:21:31 PM8/24/11
to magic...@googlegroups.com
Hi,

We don't necessarily need to assume that the environment is sampled from a computable distribution, either. This is the bayesian/frequentist debate. A Bayesian will instead say that we are looking for a computable personal belief function. To a Bayesian, probabilities exist in the mind, as a tool for dealing with uncertainty, rather than in the environment as physical random processes ("sampling").

--Abram

Eray Ozkural

unread,
Aug 24, 2011, 9:23:48 PM8/24/11
to magic...@googlegroups.com
Hi,

Where is the proof for that claim? It seems to be a conviction rather than a (physics) theorem.

It is possible for instance that in the future digital physics can be proven.

As I've said, if GR+QM were to give us a complete picture, the universe is already describable by computable mathematics, there is absolutely no question about that. (I think I should provide references for that in a paper :)

However, very good arguments can be put forward to support the notion that the distribution is computable, the negation would simply mean that the distribution has infinite information content, which requires separate physical proof, and no observation supports such an abundance of noise, especially in fundamental law. Instead, almost all physicists would expect a finite rule for the universe, I think (even if they disagree with digital physics).

Let me emphasize, our best theories, quantum mechanics for instance, falls squarely into that category of "computable distribution", *even if* non-determinism in QM turned out to be real.  

You might be referring to an obnoxious position in philosophy which claims that "metaphysical materialism cannot be proven", but, as you may have guessed, that too is false because materialism is not a metaphysical claim. That was just an excuse for superstitious thinking, therefore positivists should not favor it. :)

Best,

Laurent

unread,
Aug 25, 2011, 2:10:07 AM8/25/11
to magic...@googlegroups.com
Hi Yi then :)

From my point of view:

 

- There is no problem about the linkage between the ideal Solomonoff induction and the reality, except it is not computable : )

If we make the assumption that the environment is computable (which is a religious thing since we cannot mathematically reason about the computability of the world in which we live, accept it or not...), then we can make the precise prediction with bounded regret.

 

- My problem is with the computable agent. Shane’s result shows that even if we make the assumption that the world is simple in the Kolmogorov sense, we still cannot guarantee the performance of the agent. Moreover, beyond certain point we cannot even mathematically reason about the computable agent.


I think this statement goes way beyond Shane's results. What he says, IIRC, is that you cant' find it, except by mere chance, but that does not mean you cannot say anything about such an agent.
And as I said before, optimality is only one criterion.
 

 

- As for the comparison of two agents, I have the intuition that it is not decidable (no proof yet).


Maybe when choosing 2 particulars agents, this is true in general, but maybe we can make classes of agents that can be compared.
 

Anyway, I wrote the following Lemma, basically restate Shane’s result. However, it shows that deciding if a sequence is predicted by a given predictor is very hard.


Let me first restate the lemma, to be sure I understand correctly:
p \in P(w) means that p is a predictor of w.
D(p) is the set of sequences that p learns to predict correctly (i.e. there is some time t after which p makes no mistake on w).
You're saying that we cannot decide, for a given p and a given w, if p learns to predict w correctly?

While this is probably true in general (Rice theorem, Pigeon Hole argument) when considering p as any program and w as any (even computable ) sequence (this is similar to testing semantic equality of two given programs, and so finding the smallest equivalent program), this is not true in particular.
As you can find the smallest canonical Finite State Automaton for example, you can chose p and w in the regular language class, and there you can decide all the time if p predicts w, I think.

I mean, it's not because you cannot say something in general that you can't say something on more specific classes.

Am I missing something?

Laurent
image001.jpg

Laurent

unread,
Aug 25, 2011, 2:19:03 AM8/25/11
to magic...@googlegroups.com

So I still think that AIXI would be vastly more intelligent in the real world than the real agents we will be able to create.

>>It would be great to be able to make such a comparison in practice.



There is a computable approximation of AIXI:
http://www.hutter1.net/official/bib.htm#aixictwx

Note that this is an *extreme approximation* of AIXI, which latter has infinitely more computational power.
Recently I stumbled upon an comparison with traditional RL, and that showed that AIXI was much better, but I can't remember where I saw that...

Anyway, just to give you a sense, a "fairer" comparison of an approximation of AIXI would be to run Q-learning on a 80286 and MC-AIXI on a grid of fast computers. Probably you can guess the result if they are allocated the same time ;)

Laurent

Wén Shào

unread,
Oct 24, 2011, 12:24:57 AM10/24/11
to magic...@googlegroups.com
Hi Tom, 

Sorry to bring this old thread up. You said that, by the argument made in LV08, all universal Turing machines will end up in the smallest equivalence class. I'm not so sure about this, once you fixed a constant c, why necessarily do all universal Turing machines end up in the smallest equivalence class? 

Let's put whether this argument solves the arbitrariness in choosing reference machine aside, the mathematical solution i.e. the whole idea of equivalence class and the partial order on this set is well defined, isn't it? Moreover, to me it's not the case that all universal Turing machines will end up in the smallest equivalence class.  

Cheers, 

Wén



Vaibhav Gavane

unread,
Oct 24, 2011, 2:08:07 AM10/24/11
to magic...@googlegroups.com
Hi Wen,

In the book, the constant c is actually never "fixed". For two
complexities C_1 and C_2 to be equivalent it is sufficient that there
exist *some* constant c such that for all x |C_1(x) - C_2(x)| <= c.
Then, all universal Turing machines are indeed in the smallest
equivalence class.

I think what the authors really mean by "clean solution" is "justification".

Vaibhav

On 10/24/11, Wén Shào <90b5...@gmail.com> wrote:
> Hi Tom,
>
> Sorry to bring this old thread up. You said that, by the argument made in
> LV08, all universal Turing machines will end up in the smallest equivalence
> class. I'm not so sure about this, once you fixed a constant c, why
> necessarily do all universal Turing machines end up in the smallest
> equivalence class?
>
> Let's put whether this argument solves the arbitrariness in choosing
> reference machine aside, the mathematical solution i.e. the whole idea of
> equivalence class and the partial order on this set is well defined, isn't
> it? Moreover, to me it's not the case that all universal Turing machines
> will end up in the smallest equivalence class.
>
> Cheers,
>
> Wén
>
>
>
> On Mon, Aug 22, 2011 at 11:33 PM, Tom Everitt <tom4e...@gmail.com> wrote:
>
>> Hi again, I finally had time to read Li & Vitanyi a bit more closely.
>>
>> They start out promising:
>>
>> "The complexity C(x) is invariant only up to a constant depending on the

>> reference function *phi_0*. Thus, one may object, for *every* string
>> *x*there is an additively optimal recursive function
>> *psi_0* such that C_psi_0(x) = 0 [psi_0 assigns complexity 0 to x]. So how


>> can one claim that C(x) is an objective notion?"
>>
>> They then set about giving a "mathematically clean solution" by defining
>> some equivalence classes, claiming these equivalence classes had a single
>> smallest element. This is all well, but the problem is that according to
>> their definition of equivalence classes, all universal Turing machines end
>> up in the smallest equivalence class. And, clearly, for every string

>> *x*there is a universal Turing machine assigning complexity 0 to
>> *x*. (Just take any universal Turing machine, and redefine it so it prints
>> *x* on no input.)

Wén Shào

unread,
Oct 24, 2011, 2:28:41 AM10/24/11
to magic...@googlegroups.com
Hi Vaibhav, 

This is what confused me, it looks like from the book even within an equivalence class the "constant" c can vary depending on the complexities you're looking at and there is no fixed c for an equivalence class. How does this help to justify invariance? 

On the other hand, if we force a fixed c on equivalence relations, then we won't end up with all  UTM being in one equivalence class and the partial ordering is still well defined. There might be some other problems with it, but it makes more sense than having multiple c's? Or not? Not quite sure, just some random thoughts. 

Cheers, 

Wén

Tom Everitt

unread,
Oct 24, 2011, 4:27:42 AM10/24/11
to magic...@googlegroups.com
Hi Wen,

I'm glad you brought this one up again.

On Mon, Oct 24, 2011 at 8:28 AM, Wén Shào <90b5...@gmail.com> wrote:
This is what confused me, it looks like from the book even within an equivalence class the "constant" c can vary depending on the complexities you're looking at and there is no fixed c for an equivalence class. How does this help to justify invariance? 

Exactly, it doesn't seem to justify invariance at all. I'm not sure what you mean with that c varies with the complexities, c varies only with psi and phi (and is really a constant with respect to x at least), yet even so I definitely agree with you that it's quite unsatisfying.
 

On the other hand, if we force a fixed c on equivalence relations, then we won't end up with all  UTM being in one equivalence class and the partial ordering is still well defined. There might be some other problems with it, but it makes more sense than having multiple c's? Or not? Not quite sure, just some random thoughts. 


Keeping c fixed will remove some of the other UTM's from the phi_0 equivalence class, yes. Yet which UTMs it would remove would depend on phi_0 (and phi_0 is more or less arbitrarily chosen) - so in terms of objectiveness I think we would be back to square one.

I've noticed that right after they have a long discussion about the objectiveness of the numbering of the recursive functions, I'm not sure if it's meant to support this objectiveness argument or if the point is something else. Either way, after reading it quickly I don't see how it could help them.

So far, the best I've read on this reference machine topic is Mueller's Stationary Algorithmic Probability, but the conclusion there is pretty much that there isn't, cannot be, an objective reference machine.


Regards

Tom

Vaibhav Gavane

unread,
Oct 24, 2011, 5:06:45 AM10/24/11
to magic...@googlegroups.com
Actually, I'm not sure if keeping c fixed will result in an
equivalence relation at all. I can see that reflexivity and symmetry
are satisfied, but I can't see how to improve the upper bound of 2c in
the case of transitivity.

Vaibhav

Tom Everitt

unread,
Oct 24, 2011, 6:33:36 AM10/24/11
to magic...@googlegroups.com
Yes, good point!

Tom

Eray Ozkural

unread,
Oct 24, 2011, 6:37:08 AM10/24/11
to magic...@googlegroups.com
In practical work, the invariance theorem is mostly irrelevant, since the constants involved are quite high. A lot of problems have complexity beneath those c's, so what's the point?

Solomonoff argues that subjectivity is a desired feature of algorithmic probability, so according to him, at least, searching for an objective measure, the ultimate language, is moot. Memory brings subjectivity, obviously, so we use actually very complex machines, not minimal machines, in practice.

Philosophically, I think this has been solved partially in 
and more completely in my proposal in 2007:


So, the idea is that, you can use our universe as the universal machine. That's the most low-level machine imaginable, so it's the most neutral model that is free of any bias whatsoever.

Theoretically, I think you could use a universal quantum computer:

Which would really mean that Deutsch solved that problem in 1985.

The caveat I made was that, a slight but important detail: ultimately it is the universal model of computation that the correct Grand Unified Theory is going to provide. So, it could be some string theory contraption, or a RUCA, whatever you like, as well. The point is that such machines necessarily do not contain any information about particular world-states, but contain only universal information.

Regards,

Tom Everitt

unread,
Oct 24, 2011, 10:37:27 AM10/24/11
to magic...@googlegroups.com
On Mon, Oct 24, 2011 at 12:37 PM, Eray Ozkural <exama...@gmail.com> wrote:
In practical work, the invariance theorem is mostly irrelevant, since the constants involved are quite high. A lot of problems have complexity beneath those c's, so what's the point?

The point is purely theoretical I suppose. It would feel nice with an objective measure of complexity. In fact, this seems to be the main advertising point of Kolmogorov Complexity: that it entails that complexity is a property of the object rather than the language/subject. But never trust a salesman, right? :)

(Of course, it still works for infinite sequences.)
 

Solomonoff argues that subjectivity is a desired feature of algorithmic probability,

How could subjectivity possibly be a desirable property?
 
so according to him, at least, searching for an objective measure, the ultimate language, is moot. Memory brings subjectivity, obviously, so we use actually very complex machines, not minimal machines, in practice.

Philosophically, I think this has been solved partially in 

Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
 

Theoretically, I think you could use a universal quantum computer:

Which would really mean that Deutsch solved that problem in 1985.

The caveat I made was that, a slight but important detail: ultimately it is the universal model of computation that the correct Grand Unified Theory is going to provide. So, it could be some string theory contraption, or a RUCA, whatever you like, as well. The point is that such machines necessarily do not contain any information about particular world-states, but contain only universal information.


One of the most desirable features of an objective AIT would be that it would make induction objective - so once we would have found this GUT, no one could ever question it (given he had the same data). Now, since we can't find an objective basis of induction, we have to rely on a questionable GUT, which in turn means that once we've found the semi-objective GUT-reference machine, it will also be forever questionable. Are you with me?

So we will never have a solid foundation of science *crying*. (This may be to push my argument a bit far, I'm not sure. Let's hope.)

This is probably only philosophically annoying, but still. It serves to show why I find it incomprehensible to desire a subjective AIT.


Tom

Eray Ozkural

unread,
Oct 24, 2011, 10:54:58 AM10/24/11
to magic...@googlegroups.com
On Mon, Oct 24, 2011 at 5:37 PM, Tom Everitt <tom4e...@gmail.com> wrote:
On Mon, Oct 24, 2011 at 12:37 PM, Eray Ozkural <exama...@gmail.com> wrote:
In practical work, the invariance theorem is mostly irrelevant, since the constants involved are quite high. A lot of problems have complexity beneath those c's, so what's the point?

The point is purely theoretical I suppose. It would feel nice with an objective measure of complexity. In fact, this seems to be the main advertising point of Kolmogorov Complexity: that it entails that complexity is a property of the object rather than the language/subject. But never trust a salesman, right? :)

(Of course, it still works for infinite sequences.)
 

Solomonoff argues that subjectivity is a desired feature of algorithmic probability,

How could subjectivity possibly be a desirable property?

Well, it's a another way of saying that a well-educated man has a better perspective. :)
 
 
so according to him, at least, searching for an objective measure, the ultimate language, is moot. Memory brings subjectivity, obviously, so we use actually very complex machines, not minimal machines, in practice.

Philosophically, I think this has been solved partially in 

Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
 

It would because from a properly positivist viewpoint, "other possible universes with different physical law" are metaphysical fantasies only. If you try to answer the following question you can see that for yourself: in what sense would it not be objective?

Instead the empiricist must stick to nomological possibility (rather than metaphysical possibility as above), in which case the most objective model becomes the universal physical law. Or failing that (as in currently), the best approximation to that.

Recall that in positivism, metaphysical statements have no cognitive significance. A positivist, like me, may regard many constructions in mathematics therefore meaningless (such as actual infinity).
 
Theoretically, I think you could use a universal quantum computer:

Which would really mean that Deutsch solved that problem in 1985.

The caveat I made was that, a slight but important detail: ultimately it is the universal model of computation that the correct Grand Unified Theory is going to provide. So, it could be some string theory contraption, or a RUCA, whatever you like, as well. The point is that such machines necessarily do not contain any information about particular world-states, but contain only universal information.


One of the most desirable features of an objective AIT would be that it would make induction objective - so once we would have found this GUT, no one could ever question it (given he had the same data). Now, since we can't find an objective basis of induction, we have to rely on a questionable GUT, which in turn means that once we've found the semi-objective GUT-reference machine, it will also be forever questionable. Are you with me?

But that is also unavoidable anyhow, at least until we're reasonable sure we have it, because of incompleteness. And exactly what I tried to say.
 
At any rate, that's just a philosophical thought experiment, that wouldn't make much sense in practice :)

So we will never have a solid foundation of science *crying*. (This may be to push my argument a bit far, I'm not sure. Let's hope.)

This is probably only philosophically annoying, but still. It serves to show why I find it incomprehensible to desire a subjective AIT.

Umm, well, that's going to be a little difficult to explain, but it also has to do with the prediction accuracy, of course. You don't want very low-level machines because for many problems that would unnecessarily blow up the error rate.

So, what this means is that, in the end, only performance matters. If the performance of an AI matches human performance and exceeds it, there is no question of objectivity, it is moot :) And obviously, constructing an AI is the most important thing, AIT is fundamentally constructivist in that regard. It forces you to think in terms of a system's evolution, not in Platonic forms that do not exist anyway.

Regards, 

Tom Everitt

unread,
Oct 24, 2011, 1:22:50 PM10/24/11
to magic...@googlegroups.com
On Mon, Oct 24, 2011 at 4:54 PM, Eray Ozkural <exama...@gmail.com> wrote:
On Mon, Oct 24, 2011 at 5:37 PM, Tom Everitt <tom4e...@gmail.com> wrote:

How could subjectivity possibly be a desirable property?


Well, it's a another way of saying that a well-educated man has a better perspective. :)

Hehehe, I know, Solomonoff could definitely have used a proper logic education ;)


Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
 

It would because from a properly positivist viewpoint, "other possible universes with different physical law" are metaphysical fantasies only. If you try to answer the following question you can see that for yourself: in what sense would it not be objective?

Okay, it might appear like that from a positivist standpoint, but I'm far from convinced that the positivists are right - even though their views are appealing in many ways (I don't think I'm very up to date on positivism though, we basically just discussed Carnap/Vienna group in philosophy class).

Anyway, it seems problematic to disqualify metaphysical possibility. For example, the foundation of AIXI is "the cybernetic model", with some agent interacting with some "world". This world is essentially any metaphysically possible world, and we make no reference, have no need, of the contingencies of the natural sciences to do this. The result is a nice theory of intelligence with minimal requirements as to what world it is put in - and nice theoretical insights on intelligence.

To do something similar, only allowing nomological possible worlds, would be quite hard, no? Especially since we don't even know the laws of the universe; in contrast we do know what's metaphysically possible.

So that's why metaphysical possibility is relevant, and a universe-based prior is non-objective.

I'm sure you have a bunch of counter arguments. Bring them on :)


Abram Demski

unread,
Oct 24, 2011, 2:13:43 PM10/24/11
to magic...@googlegroups.com
Tom,

We still get the good convergent properties of Solomonoff induction. Evidence quickly overcomes the effect of the prior. So, science is still OK! We can still find the correct GUT (or, perhaps Eray meant TOE?).

As I mentioned before, updating a universal distribution on evidence can be seen as modifying the UTM. Induction is a process of seeking the correct machine.

It's interesting to note that, although using physics as the UTM would provide the best results for an agent with unlimited computing power, it would be severely impractical for limited computing power. Practical intelligence needs to explicitly take computing power into account as well.

--Abram

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Eray Ozkural

unread,
Oct 24, 2011, 2:24:46 PM10/24/11
to magic...@googlegroups.com
On Mon, Oct 24, 2011 at 8:22 PM, Tom Everitt <tom4e...@gmail.com> wrote:


On Mon, Oct 24, 2011 at 4:54 PM, Eray Ozkural <exama...@gmail.com> wrote:
On Mon, Oct 24, 2011 at 5:37 PM, Tom Everitt <tom4e...@gmail.com> wrote:

How could subjectivity possibly be a desirable property?


Well, it's a another way of saying that a well-educated man has a better perspective. :)

Hehehe, I know, Solomonoff could definitely have used a proper logic education ;)


That's not correct, because it's obvious from his writing that he does understand the incompleteness theorems in logic very well. Somebody who understands meta-mathematics is likely to understand predicate calculus, etc. very well. He also worked with Carnap, who knew a bit of logic :) What do you base this on? We positivists are all logic freaks :)

Okay, so given that the choice of reference machine is necessarily subjective, this would probably be the _most_ objective we can get. But it really wouldn't be objective.
 

It would because from a properly positivist viewpoint, "other possible universes with different physical law" are metaphysical fantasies only. If you try to answer the following question you can see that for yourself: in what sense would it not be objective?

Okay, it might appear like that from a positivist standpoint, but I'm far from convinced that the positivists are right - even though their views are appealing in many ways (I don't think I'm very up to date on positivism though, we basically just discussed Carnap/Vienna group in philosophy class).

There have been notably silly objections to positivism, but I don't think much stands except abandoning analytic/synthetic distinction. I don't think it was very important, mathematics is not special. But except that, I don't think there is any strong argument against positivism. There has been some retreat into superstitious ideas lately, but it's time to pay proper attention to philosophy that works. Solomonoff induction is, actually, a complete formalization of a main thesis of positivism: inductive inference. In case, you didn't discuss that at class.
 
Anyway, it seems problematic to disqualify metaphysical possibility. For example, the foundation of AIXI is "the cybernetic model", with some agent interacting with some "world". This world is essentially any metaphysically possible world, and we make no reference, have no need, of the contingencies of the natural sciences to do this. The result is a nice theory of intelligence with minimal requirements as to what world it is put in - and nice theoretical insights on intelligence.


No, it is not problematic. Real metaphysics is for fools, a good example of a metaphysical "possibility" would be Chalmers's imaginary world full of philosophical zombies. Or Plato's forms. That's the kind of stupidity we want to avoid. 

And I think you're confusing things, the process of inductive inference can work in any universe that supports computation, but we are in this universe, so I don't think that AIXI theory really references metaphysically possible worlds. Quite the contrary, it can only work on true or virtual worlds, all of which are part of this universe. I hope that sounds confusing. :) I'm just saying that a knowledge process does not posit a different universe for each model, those are just hypotheses. 

So, this really should work only with scientific hypotheses, not other kind. that's what I suggest. The formal toy worlds that RL researchers like to work on, are likely, quite scientific as they depend on well-defined logical rules. What's important is that they are known to be toy worlds. Imagine an AI that confuses reality with a toy world, would you think it is smart?

To do something similar, only allowing nomological possible worlds, would be quite hard, no? Especially since we don't even know the laws of the universe; in contrast we do know what's metaphysically possible.

Not really. Inductive inference is in fact trying to allow only nomologically possible worlds, otherwise it would dabble in nonsense, such as "are there ghosts in her bosom? are they caressing tom? did his lucky charm work today?", and so forth. :)
 
So that's why metaphysical possibility is relevant, and a universe-based prior is non-objective.

That's not true, since we live in this universe, and this is all that we can predict.

If you have data from multiverse, then they are admissible, too, but it's nothing like you say. It depends on where the data came from 

 
I'm sure you have a bunch of counter arguments. Bring them on :) 


Well, a problem above is assuming that AIXI model has some central relevance to AI. It is a mostly benign extension of Solomonoff induction to reinforcement learning. The extension to arbitrary alphabet is trivial in information theory. Even kids know you can encode 256 letters with an 8-bit alphabet. So, that's irrelevant. The one about arbitrary loss function is significant.

However, note that the extension does not really increase the predictive accuracy or the applicability of the method in any way. In particular, it does not create a set of more interesting problems of AI, since RL-problems are already among the kinds of problems that an AI system must be able to solve. It's just another kind of optimization problem. Although, obviously, it's a *general* optimization problem, and it is AI-complete, since we can cast any computational problem(P and NP problems etc.) and prediction/probability problems as RL-problems. But you don't have to. They can still be solved with a general problem solver that uses Sol. induction. So, I don't think that agent models are special in any way. They are not the foundation of intelligence. They are, as you say, part of cybernetic animal models, i.e. animats, that *use* intelligence.They are applications. It's artificial life, rather than intelligence. Intelligence is prediction. Nothing less, nothing more. Of course, I think behaviorism is a philosophical farce, that's why I badly want to avoid behaviorist sounding definitions. But that just makes my conviction stronger :)

So, maybe, it is useful to revisit what positivism really is to appreciate what I just said.

What is the main tenet of empiricism?

What does Solomonoff induction solve?

In the final analysis, Solomonoff induction works because it is a *complete* formalization of the scientific process, and has been shown to apply to any problem in the observable universe. It is based on a *correct philosophy of science*. That's what's important. What else could be significant here? If this is based on scientific process, talk of metaphysics is completely irrelevant.

Tom Everitt

unread,
Oct 24, 2011, 3:23:06 PM10/24/11
to magic...@googlegroups.com
This turned out to be a fun debate :)

Abram:
I agree to some extent - it is comforting that just starting with any universal prior, we will eventually converge to the true hypothesis. However, since we will always be stuck in finite time, we will always have the case:

Prior1 predicts TOE1 with high probability, and (absurd) Prior2 predicts (absurd) TOE2 with high probability, and we have no means to say whose right. Apart from some, apparently ill-based, intuition.

And this will always be the case, forever (until we reach time=infinity).

Thus my obsession with an objective prior. Having one seems to be the only way out of this problem.

Of course, in practice, this is never a problem. We seem to have our built-in, fairly good, reference machine, and it's working out well enough. But I maintain that it's not philosophically satisfying. (Well, in a sense. I suppose it would also be of some value to understand why we can never achieve an objective science, and be able to move on with good conscience.)


Eray
I need to think about your points for a bit before I reply. Let me just make the one remark below, though :)


On Mon, Oct 24, 2011 at 8:24 PM, Eray Ozkural <exama...@gmail.com> wrote:
On Mon, Oct 24, 2011 at 8:22 PM, Tom Everitt <tom4e...@gmail.com> wrote:

Well, it's a another way of saying that a well-educated man has a better perspective. :)

Hehehe, I know, Solomonoff could definitely have used a proper logic education ;)


That's not correct, because it's obvious from his writing that he does understand the incompleteness theorems in logic very well. Somebody who understands meta-mathematics is likely to understand predicate calculus, etc. very well. He also worked with Carnap, who knew a bit of logic :) What do you base this on? We positivists are all logic freaks :)


Irony :D  (I have great respect for Sol. actually, perhaps it doesn't always show.) But it was interesting extra information you provided for your argument, didn't know he worked with Carnap for instance.

Tom Everitt

unread,
Oct 25, 2011, 3:00:58 AM10/25/11
to magic...@googlegroups.com
On Mon, Oct 24, 2011 at 8:24 PM, Eray Ozkural <exama...@gmail.com> wrote:
However, note that the extension does not really increase the predictive accuracy or the applicability of the method in any way. In particular, it does not create a set of more interesting problems of AI, since RL-problems are already among the kinds of problems that an AI system must be able to solve. It's just another kind of optimization problem. Although, obviously, it's a *general* optimization problem, and it is AI-complete, since we can cast any computational problem(P and NP problems etc.) and prediction/probability problems as RL-problems. But you don't have to. They can still be solved with a general problem solver that uses Sol. induction. So, I don't think that agent models are special in any way. They are not the foundation of intelligence. They are, as you say, part of cybernetic animal models, i.e. animats, that *use* intelligence.They are applications. It's artificial life, rather than intelligence. Intelligence is prediction. Nothing less, nothing more. Of course, I think behaviorism is a philosophical farce, that's why I badly want to avoid behaviorist sounding definitions. But that just makes my conviction stronger :)


Okay, I get the point about artificial life contra intelligence. But my argument works for Sol. induction/prediction as well.

In order to know that the prediction works in this world - before knowing anything about this world - we need to figure out exactly what worlds are possible. And this is essentially all the metaphysically possible worlds!

Note though that with worlds I mean programs, and in particular their output. So the world with ghosts is essentially the same world as this world - they have the same output. (Only slightly more unlikely.)


From Wikipedia:
"Positivism is a philosophical approach.... [in which] ....sense experiences and their logical and mathematical treatment are the exclusive source of all worthwhile information."

Would this mean that I cannot a priori deduce what worlds I could possibly encounter? Strictly speaking, that would be doing math that's not the treatment of some sense datum. But that must be permissible, no?
 

In the final analysis, Solomonoff induction works because it is a *complete* formalization of the scientific process, and has been shown to apply to any problem in the observable universe. It is based on a *correct philosophy of science*. That's what's important. What else could be significant here? If this is based on scientific process, talk of metaphysics is completely irrelevant.

I agree that's pretty grand - it's a pretty good summation of why I like Sol. induction. Even so, I would have wished that the ultimate theory of science would have dealt with the problem I mentioned to Abram in my previous mail:

The interpretation of data should not be completely subjective. Consider this:

We are in a world, and have 100 times made output 1 and received positive feedback, and a 100 times made output 0 and received negative feedback.

Now, since we can't say which prior is the best, we can not say that in order to maximize reward for the next output, we should choose 1.

This is highly unintuitive. Perhaps it really is this way, but it's not very satisfying (to my uneducated mind, at least :) ).


Tom

Eray Ozkural

unread,
Oct 25, 2011, 6:38:49 AM10/25/11
to magic...@googlegroups.com
Hello Tom,

On Tue, Oct 25, 2011 at 10:00 AM, Tom Everitt <tom4e...@gmail.com> wrote:


On Mon, Oct 24, 2011 at 8:24 PM, Eray Ozkural <exama...@gmail.com> wrote:
However, note that the extension does not really increase the predictive accuracy or the applicability of the method in any way. In particular, it does not create a set of more interesting problems of AI, since RL-problems are already among the kinds of problems that an AI system must be able to solve. It's just another kind of optimization problem. Although, obviously, it's a *general* optimization problem, and it is AI-complete, since we can cast any computational problem(P and NP problems etc.) and prediction/probability problems as RL-problems. But you don't have to. They can still be solved with a general problem solver that uses Sol. induction. So, I don't think that agent models are special in any way. They are not the foundation of intelligence. They are, as you say, part of cybernetic animal models, i.e. animats, that *use* intelligence.They are applications. It's artificial life, rather than intelligence. Intelligence is prediction. Nothing less, nothing more. Of course, I think behaviorism is a philosophical farce, that's why I badly want to avoid behaviorist sounding definitions. But that just makes my conviction stronger :)


Okay, I get the point about artificial life contra intelligence. But my argument works for Sol. induction/prediction as well.

In order to know that the prediction works in this world - before knowing anything about this world - we need to figure out exactly what worlds are possible. And this is essentially all the metaphysically possible worlds!

Umm, not really. We are merely engaging in a general theory of probability about our world. This does not really imply anything about other "universes".

So, it's not like this theory was formulated in another universe and then we came here. :)

Note though that with worlds I mean programs, and in particular their output. So the world with ghosts is essentially the same world as this world - they have the same output. (Only slightly more unlikely.)


Well. Not really. As I said, an intelligent person does not waste any time thinking about ghosts, at least in this world. I think we should also avoid thinking about what an AI would do in a world with ghosts. It's irrelevant in my opinion.

However, if you do wish to consider what sort of metaphysical possibility algorithmic probability constitutes, it would seem to correspond to all possible universes with a computational, and stochastic substrate, and basically in which the laws of probability make sense. So, you see, not just any world. The "miraculous" worlds would probably not be covered by it :)
 

From Wikipedia:
"Positivism is a philosophical approach.... [in which] ....sense experiences and their logical and mathematical treatment are the exclusive source of all worthwhile information."

Would this mean that I cannot a priori deduce what worlds I could possibly encounter? Strictly speaking, that would be doing math that's not the treatment of some sense datum. But that must be permissible, no?

You cannot possibly encounter another universe, therefore it's a logical impossibility that you should reach any conclusions about them. That's one of the consequences of positivism, as I see it.

That is, unless we have observations and adequate theory of multiverse, for instance MWI of QM could be true...

In the final analysis, Solomonoff induction works because it is a *complete* formalization of the scientific process, and has been shown to apply to any problem in the observable universe. It is based on a *correct philosophy of science*. That's what's important. What else could be significant here? If this is based on scientific process, talk of metaphysics is completely irrelevant.

I agree that's pretty grand - it's a pretty good summation of why I like Sol. induction. Even so, I would have wished that the ultimate theory of science would have dealt with the problem I mentioned to Abram in my previous mail:  

The interpretation of data should not be completely subjective. Consider this:

We are in a world, and have 100 times made output 1 and received positive feedback, and a 100 times made output 0 and received negative feedback.

Now, since we can't say which prior is the best, we can not say that in order to maximize reward for the next output, we should choose 1.

This is highly unintuitive. Perhaps it really is this way, but it's not very satisfying (to my uneducated mind, at least :) ).
 

First, this "world" cannot be another world, there is no such thing. It could be part of our physical universe. So, we can imagine that it is a video game, or one of those toy worlds we use in AI :)

Second, If you use Sol.  induction, it will eventually discover any underlying stochastic regularity.

Therefore, in this case, the success of your agents crucially depends whether the laws of probability are at work in this virtual world, i.e. whether it has a consistent physical law that lends to stochastic analysis.

That is to say, why do you think, in particular that, in a video game, Laplace's rule, will not work?

Or failing that, using general induction, an agent cannot ascertain the underlying probability distribution through interaction with the environment?

I'm trying to understand the point. Since it is not meaningful to ask questions about extra-physical universes, why do you think induction would not work here?

Best, 

Abram Demski

unread,
Oct 25, 2011, 12:45:29 PM10/25/11
to magic...@googlegroups.com
Eray,

Tom is not saying anything about induction not working, or Laplace's rule failing. He is only pointing out that we need to choose a prior to determine how much evidence it takes to converge to correct predictions, and there will always be a different universal prior that we could have chosen which would tell us "not yet!" and find some different alternative more probable.

In other words, there may always be some rogue scientists who have examined the same evidence as everyone else but have a sufficiently different taste when it comes to "elegance" of a theory that they prefer a different explanation, and make different predictions. Furthermore, these rogue scientists may be correct. (The best we can do is judge their theories with our own prior, but we have no claim to "the best prior".)

--Abram

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Eray Ozkural

unread,
Oct 25, 2011, 1:55:07 PM10/25/11
to magic...@googlegroups.com
On Tue, Oct 25, 2011 at 7:45 PM, Abram Demski <abram...@gmail.com> wrote:
Eray,

Tom is not saying anything about induction not working, or Laplace's rule failing. He is only pointing out that we need to choose a prior to determine how much evidence it takes to converge to correct predictions, and there will always be a different universal prior that we could have chosen which would tell us "not yet!" and find some different alternative more probable.


Then, that's a moot point, because no universal prior tells you how close you are to the real solution. They can never do that.
 
In other words, there may always be some rogue scientists who have examined the same evidence as everyone else but have a sufficiently different taste when it comes to "elegance" of a theory that they prefer a different explanation, and make different predictions. Furthermore, these rogue scientists may be correct. (The best we can do is judge their theories with our own prior, but we have no claim to "the best prior".

That doesn't depend on the universal prior, but the experience, which is something else, I think. 

I don't think much of the subjectivity is supplied by the initial universal prior. After all, it does not contain much information.

The universal prior, in the case of humans, has evolved, and is likely the same in all humans. This is explained satisfactorily in "The Discovery of Algorithmic Probability" paper of Solomonoff.

I don't think that any formal training really replaces the universal prior that we use for basic induction, if that matters. And it would be a great find, but ummm, even then, the reference mathematical systems are virtually analogous, so even when they use formal methods, they use more or less the same universal model.

So i think, by large, that's an unnecessary paranoia :) 

That is to say, I don't think there will be so much _fundamental_ dissonance between well-educated scientists.

But if a scientist has been taught some negative education (like a special interest in theology), then it will show, of course.

What the invariance theorem really says is that, initially the choice of a reference machine might be more significant, but as we progress to harder problems that influence would decrease.
 
Regards,

Eray Ozkural

unread,
Oct 25, 2011, 2:04:08 PM10/25/11
to magic...@googlegroups.com
The importance of education was briefly touched upon as "frustrating training sequences" in a later Solomonoff paper.

If you teach a man, many false things, eventually, it will be much more difficult for him to solve future problems. Through a malicious training sequence, like theology and most philosophy, one can insert negative information to the system. I don't discuss what's possible in autonomous setting, but I remember you having discussed that, it's easy to see that most people do that in ways that are not immediately noticeable to the humans who are being conned to buy into the information (an example in politics: WMD's in Iraq). AI's aren't immune from such bad training, either.

In the case of scientists. For instance, you have most scientists who wouldn't really understand difficult theories like General Relativity, but have some formal acquaintance with them and solve some problems. Not all scientists are equally educated, or have worked equally on the same material. Einstein's brain bulged because he learnt to visualize 4d geometry. Now of course you could also say that some scientist understands a mathematical method, while the other does not. So for instance, one scientist could successfully apply a statistical method, and the second one who does not understand that could object. The reverse could happen, too, too much indoctrination into a theory (like Big Bang) could prevent a scientist from seeing the validity of an alternative theory. These cases do not require the invocation of different universal priors, as in having different models of computation, but more like different training.

Best,

Eray Ozkural

unread,
Oct 25, 2011, 2:20:47 PM10/25/11
to magic...@googlegroups.com
I forgot one important exception: programming. I think programming knowledge might change the prior in a significant way, since there is much variety in programming languages. At least in that domain, having command of high-level mathematical notation could provide a true advantage. Expressive power of a natural language might also slightly change the speed of thinking. So perhaps the japanese are faster than the rest, who knows?

Cheers,

Abram Demski

unread,
Oct 25, 2011, 2:33:36 PM10/25/11
to magic...@googlegroups.com
Eray,

I agree, it's not going to be much of a concern in practice. Indeed, it seems like we would have to know the data ahead of time in order to specifically go and construct priors that would significantly disagree after more than 1,000 observed bits. The concern is purely theoretical.

The question arises, what would we even do with such a "best UTM" if we had it? Presumably it would have to have some fairly special powers, enough for us to prefer it over other UTMs. It is very similar to trying to design the best programming language. In fact, it's the same problem.

Perhaps there is a very real sense in which some languages are more powerful; ie, it's easier to write an interpreter for any other language you like. (Lisp would do well by that metric, for example.) This pushes down the constant penalties to get to other languages. However, it *must* push something up when it pushes something down, because we only have finite probability mass to shove around. So, taking this to its conclusion, we would get a programming language that can easily write programming languages, but in which it is very difficult to write anything else...

--Abram

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Tom Everitt

unread,
Oct 25, 2011, 2:37:08 PM10/25/11
to magic...@googlegroups.com
On Tue, Oct 25, 2011 at 7:55 PM, Eray Ozkural <exama...@gmail.com> wrote:
On Tue, Oct 25, 2011 at 7:45 PM, Abram Demski <abram...@gmail.com> wrote:
Eray,

Tom is not saying anything about induction not working, or Laplace's rule failing. He is only pointing out that we need to choose a prior to determine how much evidence it takes to converge to correct predictions, and there will always be a different universal prior that we could have chosen which would tell us "not yet!" and find some different alternative more probable.


Then, that's a moot point, because no universal prior tells you how close you are to the real solution. They can never do that.

It is actually not entirely moot (I wish it was).

Consider the case where what we've seen is

s = 111111111111111111111111111111111111111111

it seems obvious that the best prediction of the next bit is 1, given the data we've seen.

But if we don't have an objective prior, we have no means to say that 1 is the best prediction, because some priors will say that 0 is the most probable continuation. (Even a measure on priors would do, then we could say that most priors predict 1.)

(This is really just Goodman's "grue" example in an AIT context.)

So what I'm after is not really a clue as to how close we are to the true model, it's a deep fact we can never have that.

What I'm after is an objective way to interpret data! Because intuitively, 1 is the objectively best prediction of the continuation of s above. But as of now, AIT does not provide any support for this intuition (maybe rightly so).

(In practice, I agree, this is not a problem. This is an entirely theoretical issue.)

Eray Ozkural

unread,
Oct 25, 2011, 2:46:02 PM10/25/11
to magic...@googlegroups.com
On Tue, Oct 25, 2011 at 9:33 PM, Abram Demski <abram...@gmail.com> wrote:
Eray,

I agree, it's not going to be much of a concern in practice. Indeed, it seems like we would have to know the data ahead of time in order to specifically go and construct priors that would significantly disagree after more than 1,000 observed bits. The concern is purely theoretical.

The question arises, what would we even do with such a "best UTM" if we had it? Presumably it would have to have some fairly special powers, enough for us to prefer it over other UTMs. It is very similar to trying to design the best programming language. In fact, it's the same problem.

Yes, the practical uselessness of the end result of this quest seems to have convinced Solomonoff that the search was futile.

However, it's still something to have defined it. So, my position seems to be in favor of Deutsch's paper, it's a more natural measure of complexity.

We might not have much to do with it, but a very advanced AI might be able to find some uses for it, it might actually have some useful properties for studying physical phenomena, don't you think?

Perhaps there is a very real sense in which some languages are more powerful; ie, it's easier to write an interpreter for any other language you like. (Lisp would do well by that metric, for example.) This pushes down the constant penalties to get to other languages. However, it *must* push something up when it pushes something down, because we only have finite probability mass to shove around. So, taking this to its conclusion, we would get a programming language that can easily write programming languages, but in which it is very difficult to write anything else...

Ok, one metric was, how easy it is to write an interpreter for itself, i.e. reflective simplicity. In LISP you have eval, but even if you didn't have it, or disallowed it, it's still pretty neat in that regard.

Many other similar definitions of essential simplicity may be a reason to prefer such a language (e.g. sub-additivity of algorithmic info). How easy is it to write programs for certain useful kinds of programs? That might be an indication.

So, yes, as you say, it should be a "well-rounded" programming language. That can only be what is called a general purpose programming language. Another reason why I chose Scheme. Is there another technical term for it? I don't think I could explain this well enough, some reviewers seemed to be puzzled over why I chose Scheme. I thought it was pretty obvious. It would be much more natural to define problems with Scheme, and it covers a lot of technical ground because it's a general purpose PL.

Another reason why John McCarthy was such a genius :) The power of LISP is through low constants in AIT and the ability to add semantics easily, though it's difficult for an AI to use those macros well for now :)

Regards, 
 

Eray Ozkural

unread,
Oct 25, 2011, 2:49:21 PM10/25/11
to magic...@googlegroups.com
Well, you know what I will say, there should be a minimal energy consuming universal quantum computer, which you can use for these purposes. If that's not good enough, I don't know what is :)

I think that should solve the problem of subjectivity once and for all.

And if you use that, you will, in fact, have 1. Don't worry about that :)

Cheers, 

Tom Everitt

unread,
Oct 25, 2011, 3:17:34 PM10/25/11
to magic...@googlegroups.com
On Tue, Oct 25, 2011 at 8:49 PM, Eray Ozkural <exama...@gmail.com> wrote:
Well, you know what I will say, there should be a minimal energy consuming universal quantum computer, which you can use for these purposes. If that's not good enough, I don't know what is :)

I think that should solve the problem of subjectivity once and for all.

And if you use that, you will, in fact, have 1. Don't worry about that :)
 

Okay, if that's objective, you win. It fits poorly with my intuition of objectivity though:)

Eray Ozkural

unread,
Oct 25, 2011, 3:19:04 PM10/25/11
to magic...@googlegroups.com
Well since you said that, I need to hear an argument, because that's not one :)

Regards,

Tom Everitt

unread,
Oct 25, 2011, 3:54:22 PM10/25/11
to magic...@googlegroups.com

Sure, give me till tomorrow to formulate it!

Tom Everitt

unread,
Oct 26, 2011, 1:04:40 PM10/26/11
to magic...@googlegroups.com
To backtrack a bit, I believe our differences stem from fundamentally different core philosophies. Following is an attempt to spell them both out:

Either one can view mathematics/logic as something a priori given, by which we try to figure out the world we happen to live. In this view the laws of the world are mere accidents; there is nothing objective about them. (I believe this is the view the cybernetic model tries to capture.)

The alternative is to take the world as the base, we learn about it through induction, and learn to generalize more and more - this process leads to mathematics. This model fits well with how human knowledge has evolved through the ages, I suppose.

The latter view, together with the assumption that mathematics has no "transcendent existence" (is confined to our universe), entails a position much like yours.

(Is this a fair description? I'm only grateful for improvements on these.)

I think I subscribe to the first view, not because I can disprove the positivist position, but because the first (I don't know what to call it) feels slightly more elegant. So perhaps this is a genuine conflict of priors :) (I don't think so though, we would probably converge would we discuss it long enough.)

Eray Ozkural

unread,
Oct 26, 2011, 1:53:09 PM10/26/11
to magic...@googlegroups.com
On Wed, Oct 26, 2011 at 8:04 PM, Tom Everitt <tom4e...@gmail.com> wrote:
To backtrack a bit, I believe our differences stem from fundamentally different core philosophies. Following is an attempt to spell them both out:

Either one can view mathematics/logic as something a priori given, by which we try to figure out the world we happen to live. In this view the laws of the world are mere accidents; there is nothing objective about them. (I believe this is the view the cybernetic model tries to capture.)


No, cybernetic view is just applied control theory, in construction of artificial animals and the like. It's basically agent models. It does not make a philosophical commitment in itself. What do you mean by cybernetic model?
 
The alternative is to take the world as the base, we learn about it through induction, and learn to generalize more and more - this process leads to mathematics. This model fits well with how human knowledge has evolved through the ages, I suppose.

The latter view, together with the assumption that mathematics has no "transcendent existence" (is confined to our universe), entails a position much like yours.
 
(Is this a fair description? I'm only grateful for improvements on these.)
 
Kind of. And the first view is Platonism, which is essentially creationist, but in case you did not notice, I should emphasize it. Some bigwigs like Martin Davis, whose work I truly respect, are under the illusion that there is some evidence for the "a priori" character of mathematics, which I still long to see. If it's personal conviction, it's just appeal to intuition, which is not held in high regard by analytic philosophers such as Tim Williamson, so why should I think of it as evidence? After all, believing that axioms of ZFC theory are true in some eternal, non-physical, secondary, hidden realm sounds a lot like believing in UFO abductions.

Solomonoff did subscribe to the correct, scientific worldview, which is the second. It's also called "empiricism" in general. He explains this in his "The Discovery of Algorithmic Probability" paper in sufficient detail, i.e. how universal induction evolved.

Nowadays, I think one of the most popular philosophy among hard-core scientists is instrumentalism:

As it's smart enough not to make dubious metaphysical posits. Just to give a flavor of modern philosophy of science, as opposed to Plato's false theory.

I think I subscribe to the first view, not because I can disprove the positivist position, but because the first (I don't know what to call it) feels slightly more elegant. So perhaps this is a genuine conflict of priors :) (I don't think so though, we would probably converge would we discuss it long enough.)

It depends on whether you think mathematics has a privilege. But why should it? From a proper scientific point of view, as an anthropologist would say: at some point we invented numbers to make better calculations, this linguistic and mental invention increased our intelligence. Do you really disagree with the scientific fact that mathematics itself evolved in human culture?

Now, the strangest thing would be that, somehow, mathematics did transcend all physical reality. Which is an extraordinary claim, and requires extraordinary evidence. However, it would be very difficult to confirm or deny, right? That is to say, such a statement does not seem scientific at all, because it would be impossible to falsify. 

This view is also highly regarded by working scientists, I mean real scientists like physicists not economists. They expect that there ought to be a way to falsify a scientific hypothesis. If it is, by definition, unassailable, then it is most likely an unscientific, and meaningless claim.

That is to say, mathematicians must seriously consider philosophy of science. That is important, because a consistent world-view is very important. One shouldn't be a crypto-creationist (Platonist) and a scientist at the same time, that is a contradiction. 

My opinion undoubtedly sounds different than most mathematicians, however, some pretty big names think like me, for instance Chaitin thinks like me, he calls this quasi-empiricism.

I complete the circle, and deny any kind of non-empiricism, there is simply no need for it. A lot of confusion stems from the disconnected and highly disharmonious views put forward by most philosophers, especially in 20th century. I have made the effort to distill them all, in the hope of finding useful material, but alas, I have obtained very little!

So, let me translate what you say to formal philosophical language:

  Do you subscribe to mathematical Platonism?

My answer is always a resounding no :) I am a genuine physicalist, and I do not have ontological commitment to such intangible, extra-physical entities. In my opinion, everything that exists, is physical. This is the strongest formulation of physicalism, as opposed to flawed definitions like supervenience physicalism.

However, if you re-formulated your question as: do you think that many-worlds interpretation may be true, or do you think that we might be in a multi-verse, then I would answer affirmatively: those are admissible hypotheses, they are possible, one needs scientific work to prove/disprove them.

Regards,

Tom Everitt

unread,
Oct 26, 2011, 2:35:58 PM10/26/11
to magic...@googlegroups.com
On Wed, Oct 26, 2011 at 7:53 PM, Eray Ozkural <exama...@gmail.com> wrote:
No, cybernetic view is just applied control theory, in construction of artificial animals and the like. It's basically agent models. It does not make a philosophical commitment in itself. What do you mean by cybernetic model?

I didn't mean it forced any philosophical commitments, only that it's natural to come up with a model like that if you believe you are a logically thinking subject, one day presented with some arbitrary world. Even so, I think you may be right, it probably fits well with other ways of thinking as well.

Nowadays, I think one of the most popular philosophy among hard-core scientists is instrumentalism:

As it's smart enough not to make dubious metaphysical posits. Just to give a flavor of modern philosophy of science, as opposed to Plato's false theory.

Okay, this is an interesting view, thanks for reminding me about it.

It depends on whether you think mathematics has a privilege. But why should it? From a proper scientific point of view, as an anthropologist would say: at some point we invented numbers to make better calculations, this linguistic and mental invention increased our intelligence. Do you really disagree with the scientific fact that mathematics itself evolved in human culture?

Well, yes, as I say, this empiricist view fits well with how we came to use numbers. But this does not mean that numbers are not existing - it may be that we discovered them, much like we discover things about the universe.


Now, the strangest thing would be that, somehow, mathematics did transcend all physical reality. Which is an extraordinary claim, and requires extraordinary evidence. However, it would be very difficult to confirm or deny, right? That is to say, such a statement does not seem scientific at all, because it would be impossible to falsify. 

Why would this be so strange, again? As for falsifiability, the claim that mathematics is restricted to our universe seems as unfalsifiable.

But this is close to a non-issue, perhaps this distinction is not even meaningful, as there is no measurable difference.

So as I see it, we have two theories of mathematics, and it has no practical consequences which we subscribe to. And I feel that the Platonistic one is slightly more elegant - hence I choose that one. And even though it makes no _practical_ difference, there's a discrepancy in which theoretical questions become interesting.

In my view, you're only trading ontological simplicity for another kind of complexity, namely what mathematics really is. When studying mathematics I find it convenient to think of it as uncovering properties of (existing!) structures. Thinking about it in an instrumentalistic way, for instance, would be sufficiently more complicated to make up for the benefit of a simpler ontology. (I have faint hope of convincing you of this :) but I'm happy enough if I can convince you that this is a "consistent world-view." :) )

Abram Demski

unread,
Oct 26, 2011, 2:36:36 PM10/26/11
to magic...@googlegroups.com
Eray,

Solomonoff induction relies on falsifiability, but Bayesian learning theory is more general. I feel falsifiability is a bit too narrow, and has been elevated to a high place in science by accident (partly through the prevalence of "statistical significance" and testing against the "null hypothesis"). In general, scientific hypotheses can be either falsifiable or verifiable or both. (It's true that universally quantified statements, which are particularly important, tend to be only falsifiable.) It's also inevitable that some meaningful questions are neither falsifiable nor verifiable, since we don't have access to the entire universe at will.
 
--Abram

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
To post to this group, send email to magic...@googlegroups.com
To unsubscribe from this group, send email to
magic-list+...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/magic-list?hl=en?hl=en

Eray Ozkural

unread,
Oct 26, 2011, 2:44:58 PM10/26/11
to magic...@googlegroups.com
On Wed, Oct 26, 2011 at 9:35 PM, Tom Everitt <tom4e...@gmail.com> wrote:
In my view, you're only trading ontological simplicity for another kind of complexity, namely what mathematics really is. When studying mathematics I find it convenient to think of it as uncovering properties of (existing!) structures. Thinking about it in an instrumentalistic way, for instance, would be sufficiently more complicated to make up for the benefit of a simpler ontology. (I have faint hope of convincing you of this :) but I'm happy enough if I can convince you that this is a "consistent world-view." :) ) 

That doesn't explain equally well-formed and wildly differing foundations for mathematics. For instance, set theory vs. category theory (or variants of set theory). Which is more elegant? Category theory is harder to understand and more complex actually, so it's not as elegant in some sense, but the statements it makes are extremely concise and informative. Proofs in category theory are much more elegant. So, will you say that the more elegant theory is uncovering the hidden structure, while the other was a failed attempt. I disagree with that. These are all useful fictions, when we say "X exists" in mathematics, that's a figure of speech. Only the marks on paper and in our brains, and on computers, etc. exist. There is absolutely no reality to the referents. However, they are, indeed, meaningful, because they describe useful mental procedures.

Also, you are ignoring that such a thing as constructivism exists :)

And it is also incorrect that Platonism has a simpler ontology. It doesn't, as it posits entities (various Forms) in addition to everything that an empiricist posits. Therefore, by Occam's razor, we must eliminate it.

Eray Ozkural

unread,
Oct 26, 2011, 2:48:15 PM10/26/11
to magic...@googlegroups.com
On Wed, Oct 26, 2011 at 9:36 PM, Abram Demski <abram...@gmail.com> wrote:
Eray,

Solomonoff induction relies on falsifiability, but Bayesian learning theory is more general. I feel falsifiability is a bit too narrow, and has been elevated to a high place in science by accident (partly through the prevalence of "statistical significance" and testing against the "null hypothesis"). In general, scientific hypotheses can be either falsifiable or verifiable or both. (It's true that universally quantified statements, which are particularly important, tend to be only falsifiable.) It's also inevitable that some meaningful questions are neither falsifiable nor verifiable, since we don't have access to the entire universe at will.

Yet, that last claim is not a good defense of metaphysics, is it? :)

Can you give a significant example of a meaningful question that is neither falsifiable nor verifiable?

Regards,
 

Gabriel Leuenberger

unread,
Sep 8, 2017, 11:46:31 AM9/8/17
to MAGIC
I'm going to argue that the pure lambda calculus should be the reference model instead of a computer like the turing machine:
A turing machine works by moving around information on tapes. What is the simplest way to move around information? I would say it is to copy/cut & paste, which is what the lambda calculus does. In a turing machine the steps of computation are totally ordered in time, which is an unneccessary assumption. In lambda calculus the steps are only partially ordered. Since special relativity we know that events in physics are also only partially ordered in time. Another thing I don't like about turing machines is that the output is a meaningless bit string that requires a sophisticated interpretation. If it should for instance represent a graph then we need to assume a system to represent integers with bits to name the nodes, whereas in lambda calculus this would not be neccessary because there are variable names. And finally it seems unclear what the best/simplest universal turing machine is, whereas the pure lambda calculus is uniquely simple. 
Instead of placing random coin flips on an input tape of a TM we could use a stochastic grammar to sample the lambda expressions. As an alternative or as an extension one can use a stochastic lambda calculus, the "psi-lambda calculus" that was briefly mentioned in this talk: https://youtu.be/fclvsoaUI-U?t=43m41s


On Wednesday, 10 August 2011 17:48:33 UTC+2, Laurent Orseau wrote:
Hi all,

For the record, my first email was:
Up to now, here are my main concerns with AIXI:
  • Reference Turing machine
    • There is currently no best choice, but it seems intuitive that small TMs are better than any random TM
  • Reinforcement Learning
    •  RL as a utility function is very convenient for simplicity, but prone to self-delusion. Knowledge-seeking is better but it prefers true random number generators (not generated by itself). I'm currently thinking about a utility function that is like knowledge-seeking but, roughly, does not like things that are too complex, or more precisely, it doesn't like things that are predictably random.
  • Incomputability
    • Time dimension not taken into account. That is the current most important problem. If we can solve that in an optimal way, we will have real AGI. I'm also working on things in this corner, but right now I'd prefer to keep my ideas for myself :)
  • Things I don't understand yet...
Other comments below:

On Wed, Aug 10, 2011 at 17:22, Tom Everitt <tom4e...@gmail.com> wrote:

Regarding limitations of the AIXI, I'd like to second Laurent on the point that the completely free choice of reference machine is a bit unsatisfying.

If I haven't missed anything, this means, among other things, that if we have two strings:

a = 111 b = 100110101010010111010101010101 001111010101

we can not objectively say that a is simpler than b, since according to some obscure languages (reference machines) b will actually be simpler than a.


Exactly.
But intuitively, a being simpler than b makes a lot more sense. Why?
In most programming languages, a would be far simpler than b, although both may turn out to be coded like "print 'aaa'"...

Now consider:
a = 111
b = 10101010
 
Which is simpler?
The question is more difficult, but it still seem plausible that a is simpler.
Even harder :
a = 111111...
b = 101010...

a still looks simpler, but if I had to bet my arm on it, I wouldn't risk it for less than ten million euros (not sure I would even risk it, my arm is quite useful).

As Laurent already said, shorter reference machine makes more sense intuitively. But appealing to intuition here is clearly not so much better than motivating a cognitive architecture intuitively - it works in practice, but is mathematically unsatisfying. Also, can one really say that one reference machine is shorter/simpler than another? doesn't that also depend on the language in which one describes the reference machines?

Yes absolutely.
*If* we choose a "reference class of models" (e.g., Turing machines), then we can choose the smallest reference machine.

I thought about another option though, just a while back:
Instead (but still considering the TMs class), why not choose the reference UTM that orders TMs in the exact same order as the "natural" order, which is to grow TMs depending on the number of states, of transitions, and some lexicographical order?
(my first problem then was that there was too many TMs of same complexity)


Now, there still also remains the problem of the reference class of models. Why should we go for TMs? Why not something else? How to decide?
TMs have a nice, very simple formulation given our human knowledge. In the end, does it all boils down to our own world? Should we choose the simplest formulation given the axioms of our world? Is that even possible?
 

This bothers me a lot. Does anyone know of any attempts to resolve this issue? It must at least have been attempted I feel.

Hutter pointed me to a failed attempt by Müller, IIRC. I don't have the exact reference on this computer.
The idea was to simulate TMs on other TMs, and do that in loop, to try to find some fixed point, or something in the genre, but that did not work.

Also, Hutter has some discussions on these matters in his 2005 book and in :
"A Philosophical Treatise of Universal Induction"
http://www.hutter1.net/official/bib.htm#uiphil

On last resort, we can set for a consensus on a  "intuitive" best choice, if we can prove that finding the best model is not feasible. But if we can avoid that, I'd be happier.

Laurent



Gabriel Leuenberger

unread,
Sep 8, 2017, 11:55:45 AM9/8/17
to MAGIC
It's nice to use both a stochastic grammar and the stochastic lambda calculus together because it defines a universal mixture over all the (semi)computable probability distributions.
Reply all
Reply to author
Forward
0 new messages