Proof that AIXI is more intelligent than plain induction

56 views
Skip to first unread message

Eray Ozkural

unread,
Mar 4, 2016, 4:30:57 AM3/4/16
to magic...@googlegroups.com, ai-phi...@yahoogroups.com
Hi there,

Where is the proof that AIXI model is actually more intelligent than plain inductive inference?

I remember that Marcus Hutter gave a heuristic argument for it in his famous paper. 

I was wondering that, I asked a few times to Laurent, but he didn't yet point it out to me. I looked for it and I couldn't find it using google search. I was hoping to cite it in a paper. 

Thanks!

--
Eray Ozkural, PhD. Computer Scientist
Founder, Gok Us Sibernetik Ar&Ge Ltd.
http://groups.yahoo.com/group/ai-philosophy

Vaibhav Gavane

unread,
Mar 4, 2016, 5:01:29 AM3/4/16
to magic...@googlegroups.com
Hi Eray,

Your message was automatically marked as spam by Gmail. I have marked
it as Not Spam, and through this message I am letting others know
about this in case they missed it.

Regarding your question, I believe we have already had a lengthy
discussion on this very list years ago. Personally, I think you are
flogging a dead horse.

Vaibhav
> --
> Before posting, please read this:
> https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
> ---
> You received this message because you are subscribed to the Google Groups
> "MAGIC" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to magic-list+...@googlegroups.com.
> To post to this group, send email to magic...@googlegroups.com.
> Visit this group at https://groups.google.com/group/magic-list.
>

Eray Özkural

unread,
Mar 7, 2016, 6:35:31 PM3/7/16
to MAGIC
Hello there!

How are you doing? Yes, we did have a discussion, but I do not recall a reference to a specific theorem. IIRC, you made some informal arguments about active agents vs. passive agents which we have all heard before. I am not looking for an informal or a heuristic (philosophical?) argument. I am looking to cite the specific theorem as part of this paper I am working on. I would like to believe whatever you said, and I will take the theorem at face value. However, where is the theorem? 

http://arxiv.org/abs/1504.03303

I am asking this because I proposed, apparently, another measure of intelligence in the draft paper above. Now, this goes without saying but, I would like to compare this measure to the other, and cite the theorem that claims (in one sense) the AIXI definition is better. I would like to know the particular *mathematical* sense in which this claim is made. And if this claim has not been made in a theorem, I believe I can easily state it and prove it, if it is true. That is not a problem for me. The reason I proposed a new definition is, because, I argued that it is properly reductionist, as it does not require a utility function, which seems like an extra posit that does not have to exist in nature for intelligence to exist. That was my observation and motivation here. To me, it looks like an attempt to salvage Quine/Davidson style behaviorism, which suffers from similar "non-reductionism" assumptions -- I used to call those "operantile souls" in philosophical discussions. So, I state what is different: the definition changes. Now, I am asking for the original mathematical argument that claimed the AIXI model was better than plain induction, so I can compare them. Presently, they are like apples and oranges, therefore I need some more points of comparison. I don't really care so much about agent models here, which you can vary as you like. I care about the mathematical reason why one model is better than the other. I don't know if you are following my line of reasoning, but you can read the paper, and try to see the context in which I will cite the theorem. I could cite informal arguments, surely, but I do not want to do only that in the course of formal definitions. I am just trying to pay proper tribute to the AIXI definition. I don't have to cite the theorem I asked for, but it would improve my paper if I could and make it easier to understand! 

Bottom line: such a theorem either exists in the literature or not. I have asked around, and haven't received a tangible answer yet. If it does exist, where is it?

Regards,

Vaibhav Gavane

unread,
Mar 8, 2016, 12:16:52 AM3/8/16
to magic...@googlegroups.com
Hi Eray,

I have some fundamental remarks and questions regarding your paper
[http://arxiv.org/abs/1504.03303].

First of all, I would like to emphasize that Legg and Hutter's
universal intelligence measure is technically defined for *policies*
(i.e. abstract mathematical functions) and not "agents" (i.e. physical
systems). More precisely and formally, intelligence is defined as a
quantitative property for the set of all "chronological" functions,
including incomputable functions like AIXI.

Now, my question is: what exactly is the set of objects for which your
intelligence measure is defined? How do you define/specify that set of
objects?

Note that your original question - comparison of AIXI and Solomonoff
Induction - can only be answered in a setting where arbitrary
(incomputable) functions are allowed because both of them are
incomputable.

Vaibhav

Eray Ozkural

unread,
Mar 8, 2016, 2:12:04 AM3/8/16
to magic...@googlegroups.com
Greetings Vaibhav,

Alright, so there seems to be no such theorem yet. Which means that I will have to present it myself. I see what you are going for, do not worry about having to explain too much about it, I happen to know computability theory fairly well. 

Before we continue with formalizations, I am going to note that I might not be comfortable with any formalization, due to foundational considerations. Perhaps, it might be beneficial if I state my considerations more precisely in this paper, so that you can see where/if we disagree formally. (But you can guess that I belong to Markov's Recursive Constructivism school of thought) Still, I think we can manage an apples-to-apples kind of a comparison here somehow as you wish, which is why I asked my question in the first place, because it looks like an apples-to-oranges kind of a comparison presently. Recall that my definition tries to eliminate the utility function (and historicity, and environment related variables), based on a philosophical motivation.

Let me then ask you which definition exactly I must cite for the definition of the policies, and the set of all chronological functions, the universal intelligence measure paper (which I cited), or another paper? Do you have a paper on the subject that I must cite? Or maybe there is one that is authoritative that should be cited, and I am missing it. I understand that I will probably have to take it from there myself, and I do not wish to waste your time with incessant questions that might seem unreasonable, or trivial -- I am hoping they are not, after all. I had read the cited papers, but sometimes you miss an important point, or even, admittedly, fail to understand one, it can happen.

However, you also seem to be missing the point in your question that this paper series is exclusively about physical intelligence. Therefore, I cannot just use any mathematical formalization in the first place, this would introduce a semantic antinomy that might not be apparent to realist mathematicians, but I have to be aware of it. Therefore, I do have to think about the measures of physical systems, and if we are talking about agents, physical agents in this paper. If you have not yet read the first installation of Ultimate Intelligence, please read it, it should provide much food for thought at any rate -- it is on the arxiv. Otherwise, the claims in the first paper of this paper sequence would not be valid, defeating the whole purpose of my new research program.

For what it is worth, the measure is defined relative to operator induction, the operators are obviously stochastic maps -- I haven't read the draft in a while, but that should be there if it is not immediately apparent, I did not bother formalizing it there IIRC. But I can formalize further in the next draft, again, to avoid potential misunderstandings. And again, to make the two definitions comparable, in the context of computability theory.

[Of course, we can just compare to sequence induction, where you had claimed there must be a significant difference.]

If you do have a mathematically precise formalization, with proper function types, domains, and co-domains, and a proof that there is some mathematical sense of "embracing and extending" or strict dominance between the inference of an optimal policy -- as defined, and other simpler inference tasks -- in a precise sense, please just show me the exact citations for the definitions and theorems, which paper, which def/theorem. I am more interested in understanding exactly which mathematical sense we claim there is a superiority. I would love to see such a theorem if it exists, before I go forth and write new definitions / theorems, to avoid any errors or misrepresentations which would be catastrophic for my paper. Neither would I wish to seem naive. There must be some justification for a new definition, though even an alternative definition is usually considered "good" in theory. I am content with proposing an inferior definition; it might have an alternative application, and I am a humble researcher. However, I wish to know under which conditions it is inferior, and why. Or chance has it, we will find out that they are equivalent in some sense, after all?

I can guess in which sense you think there is a dominance, I can state it myself as well, and it would be easy to prove I anticipate. Would you like to see such a contribution in this paper, yourself? Or would you be inclined to reject it out of hand, because "it has been done before"?

I would not want to misrepresent your or Hutter's ideas. I am actually trying to avoid redoing work that has already been done. It would also be welcome if you could tell me to whom I must attribute this particular claim of dominance. It is rather significant to me, more so than you might assume, and I am taking it thus seriously. There was a wrong interpretation of my words beforehand that I trivialized Hutter's definition, that is not so (I said using any alphabet is trivial, not the whole definition, and I apologize for my choice of words that led to that misunderstanding), however, I have my own view, which is shared by some other theoreticians AFAICT. It should be an interesting and stimulating exchange for both parties, to illustrate our views precisely when/if that happens without misunderstandings. After all, that is what mathematical theory is, there is no room for ambiguity.

Thanks a lot for considering my queries.

Best Regards,

Eray

PS: I have an interesting response to Levin's "Forbidden Information" paper, BTW, but it is not on arxiv yet. I will announce it here when I upload it.

 
Vaibhav

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
---
You received this message because you are subscribed to the Google Groups "MAGIC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magic-list+...@googlegroups.com.
To post to this group, send email to magic...@googlegroups.com.
Visit this group at https://groups.google.com/group/magic-list.

Vaibhav Gavane

unread,
Mar 8, 2016, 3:00:43 AM3/8/16
to magic...@googlegroups.com
I think the first mention of "chronological" occurs in Hutter's
original paper on AIXI [http://arxiv.org/abs/cs.AI/0004001]. He
initially uses "chronological" for Turing machines that follow the
reinforcement learning "discipline". Naturally, then the policies or
functions computed by these Turing machines can also be called
"chronological". Formally, you just need to define two finite sets in
the beginning - the action set A (set of outputs) and the perception
set P (set of inputs) - and then a chronological function/policy p can
be defined as p : P* -> A* such that (for all n) (for all x \in P^n)
[p(x)_{1:n-1} = p(x_{1:n-1})]. This is the formal definition I have
used in my own JAGI paper which you may see for more clarity
[http://dx.doi.org/10.2478/jagi-2013-0003].

In the reinforcement learning setting, Hutter showed (in the same
paper above) that AIXI works for sequence prediction (albeit with a
looser error bound). You may take things from here.

Eray Ozkural

unread,
Mar 8, 2016, 5:15:19 AM3/8/16
to magic...@googlegroups.com
Hello Vaibhan,

Yes, I do remember the paper with the theorem that shows that AIXI has a looser error bound. I just wanted to avoid making simple errors, because it turns out that I have actually made a few simple errors in recent drafts, so I am being extra careful. I will acknowledge you, as well.

Kind Regards,

Marcus Abundis

unread,
Apr 15, 2016, 10:40:07 AM4/15/16
to MAGIC, ai-phi...@yahoogroups.com
Hi Eray,

    I see that you cross-post your notes to "ai-phi...@yahoogroups.com."  Can you please give me the full address for this group, so that I might check out its doings? Are there other groups one might look at for this topic?
Thanks!
Reply all
Reply to author
Forward
0 new messages