Predictions & duplications

14 views
Skip to first unread message

jue...@idsia.ch

unread,
Oct 8, 2001, 8:11:48 AM10/8/01
to everyth...@eskimo.com

Predictions & duplications


> From: Marchal <mar...@ulb.ac.be> Thu Oct 4 11:58:13 2001
> [...]
> You have still not explain to me how you predict your reasonably next
> experience in the simple WM duplication. [...]
> So, how is it that you talk like if you do have an algorithm
> capable of telling you in advance what will be your personal
> experience in a self-duplication experience. [...]
> Where am I wrong?
> Bruno


I will try again: how can we predict what's going to happen?

We need a prior probability distribution on possible histories.
Then, once we have observed a past history, we can use Bayes' rule to
compute the probability of various futures, given the observations. Then
we predict the most probable future.

The algorithmic TOE paper discusses several formally describable priors
computable in the limit. But for the moment let me focus on my favorite,
the Speed Prior.

Assume the Great Programmer has a resource problem, just like any other
programmer. According to the Speed Prior, the harder it is to compute
something, the less likely it gets. More precisely, the cumulative prior
probability of all data whose computation costs at least O(n) time is
at most inversely proportional to n.

To evaluate the plausibility of this, consider your own computer: most
processes just take a few microseconds, some take a few seconds, few
take hours, very few take days, etc...

We do not know the Great Programmer's computer and algorithm for computing
universe histories and other things. Still, we know that he cannot do
it more efficiently than the asymptotically fastest algorithm.

The Speed Prior reflects properties of precisely this optimal algorithm.
It turns out that the best way of predicting future y, given past x,
is to minimize the (computable) Kt-complexity of (x,y). As observation
size grows, the predictions converge to the optimal predictions.

To address Bruno Marchal's questions: Many of my duplications in parallel
universes with identical past x will experience different futures y.
None of my copies knows a priori which one it is. But each does know:
Those futures y with high Kt(x,y) will be highly unlikely; those with
low Kt(x,y) won't.

Under the Speed Prior it is therefore extremely unlikely that I will
suddenly observe truly random, inexplicable, surprising events, simply
because true randomness is very hard to compute, and thus has very high
Kt complexity.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/
http://www.idsia.ch/~juergen/everything/html.html
http://www.idsia.ch/~juergen/toesv2/


Russell Standish

unread,
Oct 9, 2001, 12:50:43 AM10/9/01
to jue...@idsia.ch, everyth...@eskimo.com
In summary, it would appear that Juergen is not disputing the use of
the UDA after all, but simply the use of the uniform measure as an
initial prior over the everything of all description. He argues that
we should use the speed prior instead.

Another way of putting it is that he doesn't dispute the role of
indeterminism in generating observable complexity, only that it is
necessarily limited in the amount of complexity that can be generated.

None of this is in conflict with Occam's razor, or observable evidence
to date. In fact it is debatable if it is experimentally falsifiable -
no sequence can be proven to be truly random - there is always the
possibility of some pseudo random generator being found to be the
source.

Cheers

----------------------------------------------------------------------------
Dr. Russell Standish Director
High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (")
Australia R.Sta...@unsw.edu.au
Room 2075, Red Centre http://parallel.hpc.unsw.edu.au/rks
International prefix +612, Interstate prefix 02
----------------------------------------------------------------------------

Marchal

unread,
Oct 9, 2001, 10:54:23 AM10/9/01
to everyth...@eskimo.com
Juergen Schmidhuber wrote:


>We need a prior probability distribution on possible histories.

OK. I agree with that. But of course we differ on the meaning of
"possible histories". And we tackle also the "prior probability"
in quite different ways.


>Then, once we have observed a past history, we can use Bayes' rule to
>compute the probability of various futures, given the observations. Then
>we predict the most probable future.


It seems to me that Bayes' rule is what we *need* to build a plausible
*past* before, and concerning the future, and even the present, I prefer
a theory, or a philosophy (or religion), hoping it can put light on
the unknown.


>The algorithmic TOE paper discusses several formally describable priors
>computable in the limit. But for the moment let me focus on my favorite,
>the Speed Prior.


Mmh... You know that there exist class of infinitely speedable
programs (and theories), and even that those class correspond to a
superclass of universal machines. This entails the UD,
the Universal Dovetailer, which I like to call sometimes
the 'splashed Universal Turing Machine',
(which btw makes UD* sort of denotational semantics of the UTM) could
generate and execute quicker and quicker version of itself.
This means you need a non universal ontology.
For helping other to understand the point I
propose to await discussing this point after I give the solution of the
problem in "diagonalisation 1":
http://www.escribe.com/science/theory/m3079.html


>Assume the Great Programmer has a resource problem, just like any
>other programmer.


You told Wei Dai not taking the Great Programmer to seriously.
I am not sure I can imagine the Great Programmer having resource
problem. Unless you are a finitist! Knowing that the UTM emulating
the UD will use an unboundable part of its tape ...
How could the GP have a resource problem when he is the one
building universes in which resource problem can happen (apparently)?
Are you not presuposing some physical structure in advance somewhere?


>According to the Speed Prior, the harder it is to compute
>something, the less likely it gets. More precisely, the cumulative prior
>probability of all data whose computation costs at least O(n) time is
>at most inversely proportional to n.


Very interesting idea. I would say: the harder it is to compute
something FROM here and now the less likely it gets from here and now.
And I would
add that this is only true if your neighborhood is less complex than
yourself. I really believe your idea is interesting for some low level
computational physics. But when agent, including yourself, enters in the
play, its another matter, because it is intrinsically harder to compute
things then.


>To evaluate the plausibility of this, consider your own computer: most
>processes just take a few microseconds, some take a few seconds, few
>take hours, very few take days, etc...

?
In the work of the GP, that is in UD*, most processes takes billions
of googolplex kalpa ...
A lot of processes engaged by the UD simply does not stop.
Some are interesting and perhaps Turing Universal like the process
raising better and better approximations of the Mandelbrot
set.


>We do not know the Great Programmer's computer and algorithm for computing
>universe histories and other things. Still, we know that he cannot do
>it more efficiently than the asymptotically fastest algorithm.


Are you really sure about that. If the GP is really *the* GP,
existing by Church Thesis, I think it follows from Blum Speed-up
Theorem that it has no asymptotically fastest algorithm.
But you are talking of "our little program" generated and executed
by the UD perhaps.
If this one has an asymptotically fastest algorithm, are you sure it
still can emulate a UTM? I bet the "universe" is universal.
Also: how could we know and test we are in a quick or slow version of
that little program. Is that not (absolutely) undecidable.
Should that program really be run? If yes, are you not presupposing
again a physical universe?

>The Speed Prior reflects properties of precisely this optimal algorithm.
>It turns out that the best way of predicting future y, given past x,
>is to minimize the (computable) Kt-complexity of (x,y). As observation
>size grows, the predictions converge to the optimal predictions.


As observation grows knowledge grows, and we grows making everything
more complex that before. I guess you mean "optimal predictions" at
some level. With comp the more knowledge grows, the more ignorance
grows.


>To address Bruno Marchal's questions: Many of my duplications in parallel
>universes with identical past x will experience different futures y.
>None of my copies knows a priori which one it is. But each does know:
>Those futures y with high Kt(x,y) will be highly unlikely; those with
>low Kt(x,y) won't.


It is indeed hard to write a little program which generate a long
Kolmogorov-Chaitin incompressible 01 string.
But, as I told you earlier, it is quite easy to write a little
program which generates them all. (It is the essence of the
everything idea imo).

Take for exemple the iterated duplication experience/experiment.
It can be automated by a very simple program. After 64 iterations
there will be 1,84467.10^19 agents, and most of them will have
an incompressible 01-history. With the comp indeterminism it is
much more likely you will get such a really random string
(independently of the fact that you will be unable to prove it is
really random). Those with computable strings will be exceptional, so
that, if those agents work together they will consider (even with
Bayes) that the simple self-multiplying algorithm is the simplest
and shorter explanation, for those randomeness appearances.

Bruno

jue...@idsia.ch

unread,
Oct 10, 2001, 8:00:57 AM10/10/01
to everyth...@eskimo.com

Bruno, there are so many misleading or unclear statements
in your post - I do not even know where to start. I'll insert a few
comments below.

> Subject: Re: Predictions & duplications
> From: Marchal <mar...@ulb.ac.be>

You can be a GP. Write a program that computes all universes
computable in the limit. Add storage in case of storage overflow.
Still, you as a GP have a resource problem. At some point it will
get harder and harder for you to keep computing all the things
you'd like to compute.

The beings evolving in your universes won't be able to deduce much about
the physics of your world in which your computer is built. Just like I
cannot say much about the environment of the particular GP who programmed
us. That's where religion starts, and where algorithmic TOEs stop
making statements.

> >According to the Speed Prior, the harder it is to compute
> >something, the less likely it gets. More precisely, the cumulative prior
> >probability of all data whose computation costs at least O(n) time is
> >at most inversely proportional to n.
>
> Very interesting idea. I would say: the harder it is to compute
> something FROM here and now the less likely it gets from here and now.

Same thing. The Speed Prior leads to conditional probabilities of futures,
given the past.

> And I would
> add that this is only true if your neighborhood is less complex than
> yourself. I really believe your idea is interesting for some low level
> computational physics. But when agent, including yourself, enters in the
> play, its another matter, because it is intrinsically harder to compute
> things then.

I do not understand. But please do _not_ elaborate.

> >To evaluate the plausibility of this, consider your own computer: most
> >processes just take a few microseconds, some take a few seconds, few
> >take hours, very few take days, etc...
>
> ?
> In the work of the GP, that is in UD*, most processes takes billions
> of googolplex kalpa ...

Not if the GP has a resource problem, just like any programmer
of virtual realities has a resource problem.

Again: forget all the esoteric undecidability stuff, which does not
really matter here, and be pragmatic and try to look at things from the
perspective of some observer being computed as part of a virtual reality
that you programmed. The observer may get smart and correctly suspect
you have a resource problem, and can make nontrivial predictions from
there, using the Speed Prior.

> A lot of processes engaged by the UD simply does not stop.
> Some are interesting and perhaps Turing Universal like the process
> raising better and better approximations of the Mandelbrot
> set.

The Speed Prior does not say that any process stops. It just expresses
the assumption: the more something costs, the less likely it gets.

> >We do not know the Great Programmer's computer and algorithm for computing
> >universe histories and other things. Still, we know that he cannot do
> >it more efficiently than the asymptotically fastest algorithm.
>
> Are you really sure about that. If the GP is really *the* GP,
> existing by Church Thesis, I think it follows from Blum Speed-up
> Theorem that it has no asymptotically fastest algorithm.

Blum's theorem is completely irrelevant here. Do not
mix up undecidability results with the pragmatic
issue of predicting the future, given past observations.
I strongly encourage you to read section 6:
http://www.idsia.ch/~juergen/toesv2/node26.html
especially the section
Fast Computation of Finite and Infinite Strings
http://www.idsia.ch/~juergen/toesv2/node27.html
and
FAST: The Most Efficient Way of Computing Everything
http://www.idsia.ch/~juergen/toesv2/node28.html
to understand why there is indeed a fastest algorithm.

> But you are talking of "our little program" generated and executed
> by the UD perhaps.
> If this one has an asymptotically fastest algorithm, are you sure it
> still can emulate a UTM?

Sure. In fact, one of the asymptotically fastest UTM programs that computes
everything is exactly the almost trivial one in the 1997 book chapter:
http://www.idsia.ch/~juergen/everything/node1.html
The first program is run for one instruction every second
step, the second for one instruction every second of
the remaining steps, and so on. You do not believe it is
asymptotically fastest algorithm? Then read
Fast Computation of Finite and Infinite Strings
http://www.idsia.ch/~juergen/toesv2/node27.html

> I bet the "universe" is universal.
> Also: how could we know and test we are in a quick or slow version of
> that little program. Is that not (absolutely) undecidable.

No, that's exactly where the Speed Prior comes in. Accoring to this prior,
the slow programs computing the same thing contribute less to its
probability mass. It is explained in the chapter on
Speed Prior-Based Inductive Inference
http://www.idsia.ch/~juergen/toesv2/node32.html

> Should that program really be run? If yes, are you not presupposing
> again a physical universe?
>
>
> >The Speed Prior reflects properties of precisely this optimal algorithm.
> >It turns out that the best way of predicting future y, given past x,
> >is to minimize the (computable) Kt-complexity of (x,y). As observation
> >size grows, the predictions converge to the optimal predictions.
>
>
> As observation grows knowledge grows, and we grows making everything
> more complex that before. I guess you mean "optimal predictions" at
> some level. With comp the more knowledge grows, the more ignorance
> grows.

Huh? No, please do _not_ elaborate.

> >To address Bruno Marchal's questions: Many of my duplications in parallel
> >universes with identical past x will experience different futures y.
> >None of my copies knows a priori which one it is. But each does know:
> >Those futures y with high Kt(x,y) will be highly unlikely; those with
> >low Kt(x,y) won't.
>
> It is indeed hard to write a little program which generate a long
> Kolmogorov-Chaitin incompressible 01 string.
> But, as I told you earlier, it is quite easy to write a little
> program which generates them all. (It is the essence of the
> everything idea imo).

As you told me earlier? Huh?
That's one of the points of the 1997 book chapter.

The thing is: when you generate them all, and assume that all are
equally likely, in the sense that all beginnings of all strings are
uniformly distributed, then you cannot explain why the regular universes
keep being regular. The random futures are then just as likely as the
nonrandom ones.

So you NEED something additional to explain the ongoing regularity.
You need something like the Speed Prior, which greatly favors regular
futures over others.

> Take for exemple the iterated duplication experience/experiment.
> It can be automated by a very simple program. After 64 iterations
> there will be 1,84467.10^19 agents, and most of them will have
> an incompressible 01-history. With the comp indeterminism it is
> much more likely you will get such a really random string
> (independently of the fact that you will be unable to prove it is
> really random). Those with computable strings will be exceptional, so
> that, if those agents work together they will consider (even with
> Bayes) that the simple self-multiplying algorithm is the simplest
> and shorter explanation, for those randomeness appearances.

But don't you see? Why does a particular agent, say, yourself, with a
nonrandom past, have a nonrandom future? Why is your computer still there
after one second, although in a truly random world it would immediately
dissolve? Why do pencils keep falling down instead of up, when the
futures where they fall up are just as likely?

To repeat, the concept of the machine that computes all universe
histories is NOT by itself sufficient to explain why our future stays
regular. The additional concept of the Speed Prior explains it though.

Russell Standish

unread,
Oct 10, 2001, 7:38:44 PM10/10/01
to jue...@idsia.ch, everyth...@eskimo.com
jue...@idsia.ch wrote:
>
> So you NEED something additional to explain the ongoing regularity.
> You need something like the Speed Prior, which greatly favors regular
> futures over others.
>

I take issue with this statement. In Occam's Razor I show how any
observer will expect to see regularities even with the uniform prior
(comes about because all observers have resource problems,
incidently). The speed prior is not necessary for Occam's Razor. It is
obviously consistent with it though.

The interesting thing is of course whether it is possible to
experimentally distinguish between the speed prior and the uniform
prior, and it is not at all clear to me that it is possible to
distinguish between these cases.

Cheers

jue...@idsia.ch

unread,
Oct 11, 2001, 3:55:26 AM10/11/01
to everyth...@eskimo.com

> From R.Sta...@unsw.edu.au :


> jue...@idsia.ch wrote:
> >
> > So you NEED something additional to explain the ongoing regularity.
> > You need something like the Speed Prior, which greatly favors regular
> > futures over others.
>
> I take issue with this statement. In Occam's Razor I show how any
> observer will expect to see regularities even with the uniform prior
> (comes about because all observers have resource problems,
> incidently). The speed prior is not necessary for Occam's Razor. It is
> obviously consistent with it though.

First of all: there is _no_ uniform prior on infinitely many things.
Try to build a uniform prior on the integers. (Tegmark also wrote that
"... all mathematical structures are a priori given equal statistical
weight," but of course this does not make much sense because there is
_no_ way of assigning equal nonvanishing probability to all - infinitely
many - mathematical structures.)

There is at best a uniform measure on _beginnings_ of strings. Then
strings of equal size have equal measure.

But then regular futures (represented as strings) are just as likely
as irregular ones. Therefore I cannot understand the comment: "(comes


about because all observers have resource problems, incidently)."

Of course, alternative priors lead to alternative variants of Occam's
razor. That has been known for a long time - formal versions of Occam's
razor go at least back to Solomonoff, 1964. The big question really
is: which prior is plausible? The most general priors we can discuss are
those computable in the limit, in the algorithmic TOE paper. They do not
allow for computable optimal prediction though. But the more restrictive
Speed Prior does, and seems plausible from any programmer's point of view.

> The interesting thing is of course whether it is possible to
> experimentally distinguish between the speed prior and the uniform
> prior, and it is not at all clear to me that it is possible to
> distinguish between these cases.

I suggest to look at experimental data that seems to have Gaussian
randomness in it, such as interference patterns in split experiments.
The Speed Prior suggests the data cannot be really random, but that a
fast pseudorandom generator PRG is responsible, e.g., by dividing some
seed by 7 and taking some of the resulting digits as the new seed, or
whatever. So it's verifiable - we just have to discover the PRG method.

Russell Standish

unread,
Oct 11, 2001, 4:34:29 AM10/11/01
to jue...@idsia.ch, everyth...@eskimo.com
jue...@idsia.ch wrote:
>
>
>
> > From R.Sta...@unsw.edu.au :
> > jue...@idsia.ch wrote:
> > >
> > > So you NEED something additional to explain the ongoing regularity.
> > > You need something like the Speed Prior, which greatly favors regular
> > > futures over others.
> >
> > I take issue with this statement. In Occam's Razor I show how any
> > observer will expect to see regularities even with the uniform prior
> > (comes about because all observers have resource problems,
> > incidently). The speed prior is not necessary for Occam's Razor. It is
> > obviously consistent with it though.
>
> First of all: there is _no_ uniform prior on infinitely many things.
> Try to build a uniform prior on the integers. (Tegmark also wrote that
> "... all mathematical structures are a priori given equal statistical
> weight," but of course this does not make much sense because there is
> _no_ way of assigning equal nonvanishing probability to all - infinitely
> many - mathematical structures.)

I don't know why you insist on the prior being a PDF. It is not
necessary. With the uniform prior, all finite sets have vanishing
probability. However, all finite descriptions correspond to infinite
sets, and these infinite sets have non-zero probability.


>
> There is at best a uniform measure on _beginnings_ of strings. Then
> strings of equal size have equal measure.
>
> But then regular futures (represented as strings) are just as likely
> as irregular ones. Therefore I cannot understand the comment: "(comes
> about because all observers have resource problems, incidently)."
>

Since you've obviously barked up the wrong tree here, it's a little
hard to know where to start. Once you understand that each observer
must equivalence an infinite number of descriptions due to the
boundedness of its resources, it becomes fairly obvious that the
smaller, simpler descriptions correspond to larger equivalence classes
(hence higher probability).

> Of course, alternative priors lead to alternative variants of Occam's
> razor. That has been known for a long time - formal versions of Occam's
> razor go at least back to Solomonoff, 1964. The big question really
> is: which prior is plausible? The most general priors we can discuss are
> those computable in the limit, in the algorithmic TOE paper. They do not
> allow for computable optimal prediction though. But the more restrictive
> Speed Prior does, and seems plausible from any programmer's point of view.
>
> > The interesting thing is of course whether it is possible to
> > experimentally distinguish between the speed prior and the uniform
> > prior, and it is not at all clear to me that it is possible to
> > distinguish between these cases.
>
> I suggest to look at experimental data that seems to have Gaussian
> randomness in it, such as interference patterns in split experiments.
> The Speed Prior suggests the data cannot be really random, but that a
> fast pseudorandom generator PRG is responsible, e.g., by dividing some
> seed by 7 and taking some of the resulting digits as the new seed, or
> whatever. So it's verifiable - we just have to discover the PRG method.
>

I can't remember which incompleteness result it is, but it is
impossible to prove the randomness of any sequence. In order to
falsify your theory one would need to prove a sequence to be
random. However, of course if all known sequences are provably
pseudo-random (ie compressible), then this would constitute pretty
good evidence. However, this is a tall order, as there is no algorithm
for generating the compression behind an arbitrary sequence.

Unless someone else has some brilliant ideas, its all looking a bit grim.

----------------------------------------------------------------------------

jue...@idsia.ch

unread,
Oct 11, 2001, 7:39:02 AM10/11/01
to everyth...@eskimo.com

> > > From R.Sta...@unsw.edu.au :
> > > jue...@idsia.ch wrote:
> > > >
> > > > So you NEED something additional to explain the ongoing regularity.
> > > > You need something like the Speed Prior, which greatly favors regular
> > > > futures over others.
> > >
> > > I take issue with this statement. In Occam's Razor I show how any
> > > observer will expect to see regularities even with the uniform prior
> > > (comes about because all observers have resource problems,
> > > incidently). The speed prior is not necessary for Occam's Razor. It is
> > > obviously consistent with it though.
> >
> > First of all: there is _no_ uniform prior on infinitely many things.
> > Try to build a uniform prior on the integers. (Tegmark also wrote that
> > "... all mathematical structures are a priori given equal statistical
> > weight," but of course this does not make much sense because there is
> > _no_ way of assigning equal nonvanishing probability to all - infinitely
> > many - mathematical structures.)
>
> I don't know why you insist on the prior being a PDF. It is not
> necessary. With the uniform prior, all finite sets have vanishing
> probability. However, all finite descriptions correspond to infinite
> sets, and these infinite sets have non-zero probability.

Huh? A PDF? You mean a probability density function? On a continuous set?
No! I am talking about probability distributions on describable objects.
On things you can program.

Anyway, you write "...observer will expect to see regularities even with
the uniform prior" but that clearly cannot be true.

> > There is at best a uniform measure on _beginnings_ of strings. Then
> > strings of equal size have equal measure.
> >
> > But then regular futures (represented as strings) are just as likely
> > as irregular ones. Therefore I cannot understand the comment: "(comes
> > about because all observers have resource problems, incidently)."
>
> Since you've obviously barked up the wrong tree here, it's a little
> hard to know where to start. Once you understand that each observer
> must equivalence an infinite number of descriptions due to the
> boundedness of its resources, it becomes fairly obvious that the
> smaller, simpler descriptions correspond to larger equivalence classes
> (hence higher probability).

Maybe you should write down formally what you mean? Which resource bounds?
On which machine? What exactly do you mean by "simple"? Are you just
referring to the traditional Solomonoff-Levin measure and the associated
old Occam's razor theorems, or do you mean something else?

> > Of course, alternative priors lead to alternative variants of Occam's
> > razor. That has been known for a long time - formal versions of Occam's
> > razor go at least back to Solomonoff, 1964. The big question really
> > is: which prior is plausible? The most general priors we can discuss are
> > those computable in the limit, in the algorithmic TOE paper. They do not
> > allow for computable optimal prediction though. But the more restrictive
> > Speed Prior does, and seems plausible from any programmer's point of view.
> >
> > > The interesting thing is of course whether it is possible to
> > > experimentally distinguish between the speed prior and the uniform
> > > prior, and it is not at all clear to me that it is possible to
> > > distinguish between these cases.
> >
> > I suggest to look at experimental data that seems to have Gaussian
> > randomness in it, such as interference patterns in split experiments.
> > The Speed Prior suggests the data cannot be really random, but that a
> > fast pseudorandom generator PRG is responsible, e.g., by dividing some
> > seed by 7 and taking some of the resulting digits as the new seed, or
> > whatever. So it's verifiable - we just have to discover the PRG method.
> >
>
> I can't remember which incompleteness result it is, but it is
> impossible to prove the randomness of any sequence. In order to
> falsify your theory one would need to prove a sequence to be
> random. However, of course if all known sequences are provably
> pseudo-random (ie compressible), then this would constitute pretty
> good evidence. However, this is a tall order, as there is no algorithm
> for generating the compression behind an arbitrary sequence.

You are talking falsifiability. I am talking verifiability. Sure, you
cannot prove randomness. But that's not the point of any inductive
science. The point is to find regularities if there are any. Occam's
razor encourages us to search for regularity, even when we do not know
whether there is any. Maybe some PhD student tomorrow will discover a
simple PRG of the kind I mentioned, and get famous.

It is important to see that Popper's popular and frequently cited and
overrated concept of falsifiability does not really help much to explain
what inductive science such as physics is all about. E.g., physicists
accept Everett's ideas although most of his postulated parallel universes
will remain inaccessible forever, and therefore are _not_ falsifiable.
Clearly, what's convincing about the multiverse theory is its simplicity,
not its falsifiability, in line with Solomonoff's theory of inductive
inference and Occam's razor, which is not just a wishy-washy philosophical
framework like Popper's.

Similarly, today's string physicists accept theories for their simplicity,
not their falsifiability. Just like nobody is able to test whether
gravity is the same on Sirius, but believing it makes things simpler.

Again: the essential question is: which prior is plausible? Which
represents the correct notion of simplicity? Solomonoff's traditional
prior, which does not care for temporal complexity at all? Even more
general priors computable in the limit, such as those discussed in
the algorithmic TOE paper? Or the Speed Prior, which embodies a more
restricted concept of simplicity that differs from Kolmogorov complexity
because it takes runtime into account, in an optimal fashion?

Juho Pennanen

unread,
Oct 11, 2001, 11:34:18 AM10/11/01
to everyth...@eskimo.com

I tried to understand the problem that doctors Schmidhuber
and Standish are discussing by describing it in the most
concrete terms I could, below. (I admit beforehand I couldn't
follow all the details and do not know all the papers and
theorems referred to, so this could be irrelevant.)

So, say you are going to drop a pencil from your hand and
trying to predict if it's going to fall down or up this
time. Using what I understand with comp TOE, I would take
the set of all programs that at some state implement a
certain conscious state, namely the state in which you
remember starting your experiment of dropping the pencil,
and have already recorded the end result (I abreviate this
conscious state with CS. To be exact it is a set of states,
but that shouldn't make a difference).

The space of all programs would be the set of all programs
in some language, coded as infinite numerable sequences of
0's and 1's. (I do not know how much the chosen language +
coding has effect on the whole thing).

Now for your prediction you need to divide the
implementations of CS into two sets: those in which the
pencil fell down and those in which it fell up. Then you
compare the measures of those sets. (You would need to
assume that each program is run just once or something of
the sort. Some programs obviously implement CS several
times when they run. So you would maybe just include those
programs that implement CS infinitely many times, and
weight them with the density of CS occurrences during
their running.)

One way to derive the measure you need is to assume a
measure on the set of all infinite sequences (i.e. on
all programs). For this we have the natural measure,
i.e. the product measure of the uniform measure on
the set containing 0 and 1. And as far as my intuition
goes, this measure would lead to the empirically correct
prediction on the direction of the pencil falling. And
if I understood it right, this is not too far from what
Dr. Standish was claiming? And we wouldn't need any
speed priors.

But maybe the need of speed prior would come to play if I
thought more carefully about the detailed assumptions
involved? E.g. that each program would be run just once,
with the same speed etc? I am not sure.

Juho


/************************************************
Juho Pennanen
Department of Forest Ecology, P.O.Box 24
FIN-00014 University of Helsinki
tel. (09)191 58144 (+358-9-191 58144)
GSM 040 5455 845 (+358-40-5455 845)
http://www.helsinki.fi/people/juho.pennanen
*************************************************/

Russell Standish

unread,
Oct 11, 2001, 9:57:25 PM10/11/01
to jue...@idsia.ch, everyth...@eskimo.com
jue...@idsia.ch wrote:
>
>
>
> Huh? A PDF? You mean a probability density function? On a continuous set?

Probability Distribution Function. And PDF's are defined on all
measurable sets, not just continuous ones.

> No! I am talking about probability distributions on describable objects.
> On things you can program.

Sure...

>
> Anyway, you write "...observer will expect to see regularities even with
> the uniform prior" but that clearly cannot be true.
>

This is where I beg to differ.


> >
> > Since you've obviously barked up the wrong tree here, it's a little
> > hard to know where to start. Once you understand that each observer
> > must equivalence an infinite number of descriptions due to the
> > boundedness of its resources, it becomes fairly obvious that the
> > smaller, simpler descriptions correspond to larger equivalence classes
> > (hence higher probability).
>
> Maybe you should write down formally what you mean? Which resource bounds?
> On which machine? What exactly do you mean by "simple"? Are you just
> referring to the traditional Solomonoff-Levin measure and the associated
> old Occam's razor theorems, or do you mean something else?

It is formally written in "Why Occams Razor", and the extension to
non-computational observers is a fairly obvious corollory of the
contents of "On Complexity and Emergence".

To summarise the results for this list, the basic idea is that a
observer acts as a filter, or category machine. On being fed a
description, the description places it into a category, or equivalence
class of descriptions. What we can say is that real observers will
have a finite number of such equivalence classes it can categorise
descriptions into (be they basins of attraction in a neural network,
or output states in the form of books, etc). This is important,
because if we consider the case of UTMs being the classification
device, and outputs from halting programs as being the categories,
then we end up with the S-L distribution over the set of halting
programs. Unfortunately, the set of halting programs has measure zero
within the set of all programs (descriptions), so we end up with the
White Rabbit paradox, namely that purely random descriptions have
probability one.

One solution is to ban the "Wabbit" universes (which you do by means
of the speed prior), and recover the nicely behaved S-L measure,
which, essentially, is your solution (ie S-L convolved with the Speed
Prior).

The other way is to realise that the only example of conscious
observer we are aware of is actually quite resilient to noise - only a
small number of bits in a random string will be considered
significant. We are really very good at detecting patterns in all
sorts of random noise. Hence every random string will be equivalenced
to some regular description. Then one ends up with an _observer
dependent_ S-L-like probability distribution over the set of
categories made by that observer. The corresponding Occam's Razor
theorem follows in the usual way.

I hope this is clear enough for you to follow...

The problem is that the uniform prior is simpler than the speed
prior. It is very much the null hypothesis, against which we should
test the speed prior hypothesis. That is why falsifiability, and more
importantly verifyability is important. It is entirely possible that
we exist within some kind a stupendous, but nevertheless resource
limited virtual reality. However, is it not too much to ask for
possible evidence for this?

My main point is that your objection to the uniform prior (essentially
the so called "White Rabbit" problem) is a non-problem.

Cheers

jue...@idsia.ch

unread,
Oct 12, 2001, 4:28:21 AM10/12/01
to everyth...@eskimo.com

In reply to Russell Standish and Juho Pennanen I'd just like to
emphasize the main point, which is really trivial: by definition, a
uniform measure on the possible futures makes all future beginnings of
a given size equally likely. Then regular futures clearly are not any
more likely than the irregular ones. I have no idea what makes Russell
think this is debatable. In most possible futures your computer will
vanish within the next second. But it does not. This indicates that our
future is _not_ sampled from a uniform prior.

Some seem to think that the weak anthropic principle explains the
regularity. The argument goes like this: "Let there be a uniform measure
on all universe histories, represented as bitstrings. Now take the tiny
subset of histories in which you appear. Although the measure of this
subset is tiny, its conditional measure, given your very existence,
is not: According to the weak anthropic principle, the conditional
probability of finding yourself in a regular universe compatible with
your existence equals 1."

But it is essential to see that the weak anthropic principle does not
have any predictive power at all. It does not tell you anything about
the future. It cannot explain away futures in which you still exist
but irregular things happen. Only a nonuniform prior can explain this.

Which nonuniform prior is plausible? My favorite is the "resource-optimal"
Speed Prior. I am hoping and expecting that someone will confirm it soon
by finding a rather fast pseudorandom generator responsible for apparently
noisy events in our universe. But others may be more conservative and
go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe
even for the nonenumerable prior mu^E (which dominates mu^M while still
being computable in the limit) or maybe even for the extreme prior mu^G.


Juergen Schmidhuber


> From everything-...@eskimo.com Thu Oct 11 17:36:41 2001
> From: Juho Pennanen <juho.p...@helsinki.fi>

> Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700
> From: Russell Standish <R.Sta...@unsw.edu.au>

h...@finney.org

unread,
Oct 12, 2001, 1:34:50 PM10/12/01
to everyth...@eskimo.com, jue...@idsia.ch
Juergen writes:
> Some seem to think that the weak anthropic principle explains the
> regularity. The argument goes like this: "Let there be a uniform measure
> on all universe histories, represented as bitstrings. Now take the tiny
> subset of histories in which you appear. Although the measure of this
> subset is tiny, its conditional measure, given your very existence,
> is not: According to the weak anthropic principle, the conditional
> probability of finding yourself in a regular universe compatible with
> your existence equals 1."
>
> But it is essential to see that the weak anthropic principle does not
> have any predictive power at all. It does not tell you anything about
> the future. It cannot explain away futures in which you still exist
> but irregular things happen. Only a nonuniform prior can explain this.

Isn't this fixed by saying that the uniform measure is not over all
universe histories, as you have it above, but over all programs that
generate universes? Now we have the advantage that short programs
generate more regular universes than long ones, and the WAP grows teeth.

Hal Finney

Russell Standish

unread,
Oct 14, 2001, 4:50:47 AM10/14/01
to h...@finney.org, everyth...@eskimo.com, jue...@idsia.ch
That is almost the correct solution, Hal. If we ask what an observer
will make of a random description chosen at random, then you get
regular universes with probability exponentially related to the
inferred complexity. It is far clearer to see what happen when the
observer is a UTM, forcibly terminating programs after a
certain number of steps (representing the observer's resource bound)
(thus all descriptions are halting programs). Then one obtains a
Solomon-Levy distribution or universal prior. However, this argument
also works when the observer is not a UTM, but simply a classification
device of some kind.

The WAP has nothing to do with this issue, except inasmuch as
universes can only be observed through the eyes of some observer.

Again I reiterate that Juergen's resource-bounded "Great Programmer"
religion need be nothing but a reflection of our conscious selves stamped
upon our observations.

Cheers

----------------------------------------------------------------------------

jamikes

unread,
Oct 14, 2001, 9:41:50 AM10/14/01
to Russell Standish, h...@finney.org, everyth...@eskimo.com
Dear Russell and Hal,
this is a naive question, however it goes into the basics of your written
explanations. How would YOU define "random"? I had this problem for a long
long time and never got a satisfactory solution. In my (non Indo-European)
language, Hungarian, there is no exact word for it, it is used as a term
meaning "according to whim or taste" (tetszöleges) - which leaves open that
I may not LIKE the 'random' in question. If you say: a sequence defying all
rules, then it is not random, it is calculable. You have to consider all
rules and cut them out. Is a series of 11111... random? of course it may be.
Is the pi-decimal random? no, because it is a result of calculation. If one
says: "just pick it" that would follow all circumstantial pressure of the
situation, maybe unconsciously, yet defying the 'random'.

So what is the content you use behind that word?

I would be surprised if both of you (and others, too) would agree <G>.
Till then I wish you luck to use the word - at random.

Best wishes
John Mikes


----- Original Message -----
From: "Russell Standish" <R.Sta...@unsw.EDU.AU>
To: <h...@finney.org>
Cc: <everyth...@eskimo.com>; <jue...@idsia.ch>
Sent: Sunday, October 14, 2001 4:36 AM
Subject: Re: Predictions & duplications

Saibal Mitra

unread,
Oct 14, 2001, 12:52:16 PM10/14/01
to jamikes, Russell Standish, h...@finney.org, everyth...@eskimo.com
John Mikes wrote:
``.... If you say: a sequence defying all

rules, then it is not random, it is calculable. You have to consider all
rules and cut them out.´´

If you try to do that then you encounter the famous halting problem.

Saibal

Russell Standish

unread,
Oct 14, 2001, 7:14:06 PM10/14/01
to jamikes, Russell Standish, h...@finney.org, everyth...@eskimo.com
"According to whim or taste" implies a conscious entity performing
choices according to a free will. This need not be the case. In my
mind, random means selected without cause (or without
procedure/algorithm).

A lot has been written on randomness, and its problematic nature. I
don't for one minute suggest I have anything new to say on the
matter.

Cheers

----------------------------------------------------------------------------

jue...@idsia.ch

unread,
Oct 15, 2001, 5:38:20 AM10/15/01
to everyth...@eskimo.com

But there is no uniform prior over all programs!
Just like there is no uniform prior over the integers.
To see this, just try to write one down.

BTW, it's not Solomon-Levy but Solomonoff-Levin. And
it has nothing to do with resource bounds!

Juergen Schmidhuber

> From h...@finney.org Fri Oct 12 19:22:02 2001


>
> Juergen writes:
> > Some seem to think that the weak anthropic principle explains the
> > regularity. The argument goes like this: "Let there be a uniform measure
> > on all universe histories, represented as bitstrings. Now take the tiny
> > subset of histories in which you appear. Although the measure of this
> > subset is tiny, its conditional measure, given your very existence,
> > is not: According to the weak anthropic principle, the conditional
> > probability of finding yourself in a regular universe compatible with
> > your existence equals 1."
> >
> > But it is essential to see that the weak anthropic principle does not
> > have any predictive power at all. It does not tell you anything about
> > the future. It cannot explain away futures in which you still exist
> > but irregular things happen. Only a nonuniform prior can explain this.
>
> Isn't this fixed by saying that the uniform measure is not over all
> universe histories, as you have it above, but over all programs that
> generate universes? Now we have the advantage that short programs
> generate more regular universes than long ones, and the WAP grows teeth.

Juho Pennanen

unread,
Oct 15, 2001, 7:29:50 AM10/15/01
to everyth...@eskimo.com
Juergen writes

> But there is no uniform prior over all programs!
> Just like there is no uniform prior over the integers.
> To see this, just try to write one down.

This is of course true (if uniform measure is a
measure that gives the same, non-zero, probability
for each program. I got no idea what's the official
definition).

OTOH, There's of course the natural product measure over
the set of all infinite strings. In my last post I
tried to show that this is enough. I admit now I was
cutting much too many corners. It is not even obvious
that the set of all programs that produce a specific
conscious experience is measurable. And even if it
was, it now seems obvious to me that random
programs produce random worlds.

So some kind of 'resource limitation' must exist.
One option would be to accept only programs that
are shorter than some maximum length. This corresponds
to only accepting worlds that have complexity lower
than some maximum. But this is one type of
'resource limitation' and in fact a very unelegant
one.

Juho

Marchal

unread,
Oct 15, 2001, 10:50:51 AM10/15/01
to everyth...@eskimo.com
Saibal Mitra wrote:

>John Mikes wrote:
>``.... If you say: a sequence defying all


>rules, then it is not random, it is calculable. You have to consider all

>rules and cut them out.´´
>
>If you try to do that then you encounter the famous halting problem.

Exactly. So why not defined random by incompressible by any (universal)
machine (Kolmogorov incompressibility). In similar sense you can prove that
"most" finite and infinite sequence are incompressible.
Here Church Thesis gives some "absolute" epistemological meaning of that
notion of randomnes.

Cf Li and Vitanyi

Bruno

Marchal

unread,
Oct 15, 2001, 11:25:17 AM10/15/01
to everyth...@eskimo.com
Hal Finney wrote:

>Isn't this fixed by saying that the uniform measure is not over all
>universe histories, as you have it above, but over all programs that
>generate universes? Now we have the advantage that short programs
>generate more regular universes than long ones, and the WAP grows teeth.

I agree with Juergen on this: there is no uniform measure over all
programs (nor on all integers).

But in anycase you know I argue, unlike Juergen, that the measure should
be put on all (relative) consistent computational extensions
appearing in UD*. This gives indeed a sort of Weak Turing-tropic
Principle.

In which relative *speed* is a by product, not a prior.

Bruno

h...@finney.org

unread,
Oct 15, 2001, 2:25:01 PM10/15/01
to everyth...@eskimo.com, jue...@idsia.ch
Juergen Schmidhuber writes:
> But there is no uniform prior over all programs!
> Just like there is no uniform prior over the integers.
> To see this, just try to write one down.

I think there is. Given a program of length l, the prior probability
is 2^(-l). (That is 2 to the power of negative l.) The length of a
program is defined by interpreting it using self-delimiting rules as
is customary in the AIT analysis of Greg Chaitin.

Hal Finney

jamikes

unread,
Oct 15, 2001, 4:10:33 PM10/15/01
to Russell Standish, everyth...@eskimo.com
"According to whim or taste" implies a conscious entity performing
choices according to a free will. This need not be the case. In my
mind, random means selected without cause (or without
procedure/algorithm)."

Russell picked my example from a language which has no equivalent to the
word "random" . Besides: I deny free will, except for the "Creator Almighty"
when the world was drawn up. <G> Since then all free will is subject to I/O
circumstances, prior art, experience, whatever (un?)conscious consideration
one may include.

"A lot has been written on randomness, and its problematic nature. I
don't for one minute suggest I have anything new to say on the
matter. "

Maybe some choice anong controversial versions?
I find Bruno's choice remarkable about the incompressibility. Will think
about it, thanks, Bruno.

JM

h...@finney.org

unread,
Oct 15, 2001, 7:23:21 PM10/15/01
to everyth...@eskimo.com
Saibal wrote:
> This doesn't seem to be very uniform to me. Maybe you mean that with the
> above prior the probability for a bit, drawn randomly from the set of all
> programs, to be 1 is 1/2 ?

I should have used the proper name for this, the universal prior.

However it is derived in the limit (of string length -> infinity)
by giving the same probability to all strings, and then adopting the
convention that we ignore the material past the end of a self-delimiting
program.

As we go to the limit, if we are looking at n bit strings, then a self
delimiting program of length l will have 2^(n-l) instances among the
2^n possible strings, hence will have measure 2^(-l), which is the
universal prior. So the universal prior on self-delimiting programs
is the same as the uniform prior on all program strings, as we take the
limit in program string length going to infinity.

Hal

Saibal Mitra

unread,
Oct 15, 2001, 7:30:56 PM10/15/01
to h...@finney.org, everything
Hal Finney wrote:

This doesn't seem to be very uniform to me. Maybe you mean that with the


above prior the probability for a bit, drawn randomly from the set of all
programs, to be 1 is 1/2 ?

Saibal

Saibal


Russell Standish

unread,
Oct 15, 2001, 7:36:08 PM10/15/01
to h...@finney.org, everyth...@eskimo.com, jue...@idsia.ch
Hal - that is not a uniform measure!

----------------------------------------------------------------------------

Marchal

unread,
Oct 16, 2001, 11:12:05 AM10/16/01
to everyth...@eskimo.com
Juergen Schmidhuber wrote:


>Bruno, there are so many misleading or unclear statements
>in your post - I do not even know where to start.


Rhetorical tricks, if not insult as usual :(
Have you read http://www.escribe.com/science/theory/m3241.html

Well, G* tells me to remain mute here, but then I don't care.


>[...]

>You can be a GP. Write a program that computes all universes
>computable in the limit. Add storage in case of storage overflow.
>Still, you as a GP have a resource problem. At some point it will
>get harder and harder for you to keep computing all the things
>you'd like to compute.


Are you using the GP as an analogy or what? What is exactly
your ontology? You have not answered my question: How could the
GP have resource problems? Is not resource a notion internal to
some universe, and are not universe generated by the GP.
You are presupposing physics.


>Just like I cannot say much about the environment of the particular
>GP who programmed us.


Are you really serious? Are you believing we have a GP in a universe?
Where does the universe of our GP comes from?
This is tortoise build on tortoises. Even infinite tortoises
sequences embedded in a physical universe. That explains nothing.

I knew you have not yet understand
the comp reversal physico/psycho, but you have not even seen
the reversal physico/computer-science, or what?


>Again: forget all the esoteric undecidability stuff, which does not
>really matter here, and be pragmatic and try to look at things from the
>perspective of some observer being computed as part of a virtual reality
>that you programmed. The observer may get smart and correctly suspect
>you have a resource problem, and can make nontrivial predictions from
>there, using the Speed Prior.


You are still postulating what I am searching an explanation for.
I am not (necessarily) pragmatic in advance. A concrete virtual reality
on earth inherits the "normality" of the world in which
it is embedded. I pretend that comp force us to justify that
normality. It is not obvious that that is possible.
But the UDA forces to take all computations into account, and
undecidability stuff is unavoidable, if only because we can only *bet*
on our consistent extentions (In the translation of the UDA into
arithmetic, consistent extension have the unprovable type "<>p").

Incompleteness phenomena put fundamental constraints for a TOE.
You could call them logical and arithmetical priors.

But you don't seem searching a toe, just a recipe for
predicting some granted neighborhood.

The arithmetical UD is just the set of \Sigma 1 sentences.
Resource must be explained from it, not the inverse.

I am not sure a TOE can be pragmatic. Like
quantum mechanics is not useful for cooking.


>No, that's exactly where the Speed Prior comes in. Accoring to this prior,
>the slow programs computing the same thing contribute less to its
>probability mass. It is explained in the chapter on
>Speed Prior-Based Inductive Inference
>http://www.idsia.ch/~juergen/toesv2/node32.html


I don't need to read it because it is in contradiction with the
"invariance lemma", in particular by the fact that from a first
person point of view the delays (between the many acces by the UD
of near computational states) does not matter.
(BTW I have read it, and we discussed it before).

You know the GP will generate itself, but will generate also less and
less efficient version of itself. From the first person point of view
they
are indistinguishable. Are you telling me that the Juergen and Bruno
appearing in those very slow GPs are zombie?


>> It is indeed hard to write a little program which generate a long
>> Kolmogorov-Chaitin incompressible 01 string.
>> But, as I told you earlier, it is quite easy to write a little
>> program which generates them all. (It is the essence of the
>> everything idea imo).
>
>As you told me earlier? Huh?
>That's one of the points of the 1997 book chapter.


I was meaning "As I recall you earlier in some recent post"!
(but look at my 1988 Toulouse paper ;-)


>The thing is: when you generate them all, and assume that all are
>equally likely, in the sense that all beginnings of all strings are
>uniformly distributed, then you cannot explain why the regular universes
>keep being regular. The random futures are then just as likely as the
>nonrandom ones.
>

>So you NEED something additional to explain the ongoing regularity.
>You need something like the Speed Prior, which greatly favors regular
>futures over others.


You speak like if I were proposing the iterated duplication as
a universal explanation. It is only an illustration of the comp
indeterminism. With comp it is the whole work of the UD, UD*, which
constitutes the domain of indeterminism. The self-multiplication
experience
just show how a 3 deterministic process consistent with comp explains
a phenomenology of (in the limit) sort of absolute randomnes.
The "real" experience is the infinite running of the UD. And that running
is embedded in the arithmetical truth, for exemple under the form of
\Sigma_1 sentences.
UD* generalises both QM and "the iterated self duplication" experiences.
The unawareness of delays forces us to take into account an infinite set
of infinite histories (like in Feynman formulation of QM).

Some internal (modal) interpretation are given by the logic of
what consistent machine can prove. Have you follow my post with George
Levy on Modal Logic?

Have you follow the recent version of the UDA in 11 steps, I have
proposed to Joel Dobrzelewski see links:
http://www.escribe.com/science/theory/m3044.html


>But don't you see? Why does a particular agent, say, yourself, with a
>nonrandom past, have a nonrandom future? Why is your computer still there
>after one second, although in a truly random world it would immediately
>dissolve? Why do pencils keep falling down instead of up, when the
>futures where they fall up are just as likely?


We "practice" iterated duplications all the time only below our
sharable level of substitution. (like in Everett). This is plausible.
UD* structure is highly non trivial especially from the point of view
of consistent machine "surviving" on their
consistent extensions which are *sparsed* among UD*.


>To repeat, the concept of the machine that computes all universe
>histories is NOT by itself sufficient to explain why our future stays
>regular.


But the concept of the machine that executes all programs is NOT
intended to explain why our future stays regular. It is the unavoidable
reality for machines from their provable and consistent point of view.
To add any priors is just sort of treachery with comp.


> The additional concept of the Speed Prior explains it though.


I would like to believe you but you don't even adress the problems
which interest me like the link between 1 and 3 person discourse, the
extraction of the quantum from a measure on the whole UD*, etc.

I don't see how you attach mind to a single computation.
You are using a naive theory of mind which is inconsistent with
both comp and QM at once.

I am agnostic about the little program, although the UD*-measure,
*if* computable (which I doubt) would define it.

My main question to you is "how does the speed prior filters out
the many possible slow similar history generated
by the GP, given that we cannot distinguish them, and we cannot be
aware of which one we belong here and then.

Do you know the logic of provability and quantum logic? Have you
follow http://www.escribe.com/science/theory/m2855.html and the
posts leading to? (some mathematician prefer to consider the
UDA only as a motivation for that result).

The more urgent, if you want to refute my approach, is to show me
where I am wrong in the UDA.

I repeat I don't refute your approach. It surely lacks motivation
(especially) your second paper, but with comp it is hardly complete.
You dismiss 100% of the mind body problem.
I am not astonished, it is a widespread disease since Aristotle.

Another problem with your approach is that it is inconsistent
with quantum mechanics. As you know.
Mine is consistent with QM and even entails some of it and
(normally) all of it (less possible geographical peculiarities).


>I do not understand. But please do _not_ elaborate.


How am I suppose to interpret that. Well, I know you believe in
(and are using) a one-one mind-body/computation relation, but this
has been shown inconsistent with comp (Marchal 88, Maudlin 89,
reference are in my thesis URL below).

So you should still explain us how you attach mind to single
computations, but this will contradict the invariance lemma,
so I'm skeptical. This does not mean the prior approach has not
some intrinsical interest. Your critics are to poor for
helping to see if our approaches are compatible or not.
In a older post I conjecture your approach is reductible to mine
in the case of comp with a sort of infinitely low level of
substitution. This could perhaps help singularising relevant
computations. But with your unfair remarks I realise I am not
even sure you read my post and that I'm probably wasting my time.


Bruno


http://iridia.ulb.ac.be/~marchal

jue...@idsia.ch

unread,
Oct 16, 2001, 11:39:00 AM10/16/01
to everyth...@eskimo.com

Confusion about what's a measure?
What's a distribution?
Simple but important!
For bitstrings x:

M measure:
M(empty string)=1
M(x) = M(x0)+M(x1) nonnegative for all finite x.

P probability distribution:
Sum_x P(x) = 1; P(x) nonnegative

---

M semimeasure - replace "=" by ">=":
M(x) >= M(x0)+M(x1)

P semidistribution - replace:
Sum_x P(x) <= 1

---

Examples:

1. Distribution: E.g., integers n: P(n) = 6/(Pi n^2)
2. Semidistribution: m(x) = probability of guessing a halting program
for x (BTW, Hal, this was first published by Levin in 1974, not by
Chaitin in 1975)
3. Measure: E.g., each x of size n gets weight 2^-n
4. Semimeasure: E.g., mu^M(x) = probability of guessing a halting or
nonhalting monotone TM program whose output starts with x

Check out: Measures and Probability Distributions
Section 4 of "Algorithmic TOEs"
http://www.idsia.ch/~juergen/toesv2/node15.html

Russell Standish

unread,
Oct 16, 2001, 7:08:25 PM10/16/01
to jue...@idsia.ch, everyth...@eskimo.com
jue...@idsia.ch wrote:
>
>
> Confusion about what's a measure?
> What's a distribution?
> Simple but important!
> For bitstrings x:
>
> M measure:
> M(empty string)=1
> M(x) = M(x0)+M(x1) nonnegative for all finite x.

This sounds more like a probability distribution than a measure. In
the set of all descriptions, we only consider infinite length
bitstrings. Finite length bitstrings are not members. However, we can
identify finite length bitstrings with subsets of descriptions. The
empty string corresponds to the full set of all descriptions, so the
first line M(empty string)=1 implies that the measure is normalisable
(ie a probability distribution).

As I said before, non-normalisable measures are considered measures by
the mathematical community - and the uniform measure over the set of
all descriptions is well defined.

>
> P probability distribution:
> Sum_x P(x) = 1; P(x) nonnegative
>
> ---
>
> M semimeasure - replace "=" by ">=":
> M(x) >= M(x0)+M(x1)
>
> P semidistribution - replace:
> Sum_x P(x) <= 1
>
> ---
>
> Examples:
>
> 1. Distribution: E.g., integers n: P(n) = 6/(Pi n^2)
> 2. Semidistribution: m(x) = probability of guessing a halting program
> for x (BTW, Hal, this was first published by Levin in 1974, not by
> Chaitin in 1975)
> 3. Measure: E.g., each x of size n gets weight 2^-n
> 4. Semimeasure: E.g., mu^M(x) = probability of guessing a halting or
> nonhalting monotone TM program whose output starts with x
>
> Check out: Measures and Probability Distributions
> Section 4 of "Algorithmic TOEs"
> http://www.idsia.ch/~juergen/toesv2/node15.html
>
>
>
> Juergen Schmidhuber
>
> http://www.idsia.ch/~juergen/
> http://www.idsia.ch/~juergen/everything/html.html
> http://www.idsia.ch/~juergen/toesv2/
>
>

----------------------------------------------------------------------------

jamikes

unread,
Oct 17, 2001, 4:45:05 PM10/17/01
to Marchal, everyth...@eskimo.com
Bruno, I appreciate your choice of incompressibility - as far as
mathematical views are concerned. How about a "random" choice of a color
from a hundred others? can this be algorithmic and incomressible?
Or a choice "at random" from available several routes, how to defend an
innocent accused in court?
I admire you, physicists, for writing an equation to everything. I cannot
find this applicable with infinite variables and infinite levels of
influences among applicable factors (=natural systems).
I do not find Russell's "choice without a cause" applicable without
interjecting "known" before 'cause'. Which does not mean deterministic
extremism, the choices are close and fractalously propagating into quite
diverse routes, so it is beyond designability how a later situation will
look.

John M

Russell Standish

unread,
Oct 17, 2001, 9:34:12 PM10/17/01
to jamikes, Marchal, everyth...@eskimo.com
Yes, in a different context, random could be applied to deterministic
chaos, however in the context of our discussion, we're not talking
about that.

Cheers

----------------------------------------------------------------------------

Marchal

unread,
Oct 18, 2001, 8:32:42 AM10/18/01
to everyth...@eskimo.com, everyth...@eskimo.com
John Mikes wrote:


>Bruno, I appreciate your choice of incompressibility - as far as
>mathematical views are concerned.


And you know that with comp a case is made there is nothing
outside mathematics (even outside arithmetics) so that's ok for me.
But you know also that the determinist self-dup entails the
possibility that we are confront with such form of indeterminism.


>How about a "random" choice of a color
>from a hundred others?


If the colors are clearly distinguishable, you can use an incompressible
string of O1-symbols translated into base 100.
It is not practical, because you will not be able to prove it is
random, but the point is that the UD generate all such sequences, so
comp "nature" makes such random "choice" all the "time".


>can this be algorithmic and incomressible?


Yes. But it cannot be known for sure.


>Or a choice "at random" from available several routes, how to defend an
>innocent accused in court?


I don't see the relationship. To defend a innocent accused in court,
you better should refute the evidences given by those in charge.


>I admire you, physicists, ...


I'm not a physicist. I am, let us say, an hesitator between
biology and chemistry, who after discovering "godel's result"
realise that an abstract biology exists as a branche of mathematics.
So I decide (in the early 70) to study mathematics for showing
that it is not the cells who obeys to the laws of chemistry (like
James Watson said in "molecular biology of the gene") but it is
the laws of chemistry which obeys (in a deeper way, sure) to the
laws of cells (self-replication).


>...for writing an equation to everything.


>I cannot
>find this applicable with infinite variables and infinite levels of
>influences among applicable factors (=natural systems).


If the equation of everything is supposed to give a complete
explanation, or even just a complete description, I certainly
do not believe in it. Not because of some natural systems (I
don't believe that exists with comp) but because there is no
complete theory of numbers and/or machines.


>I do not find Russell's "choice without a cause" applicable without
>interjecting "known" before 'cause'.


Instead of "choice without a cause" I would say "selection of
possibility" without any mean to influence the selection, like in
the self duplication experiment in the comp frame. Here we can know
even God cannot know the result of the selection.


>Which does not mean deterministic
>extremism, the choices are close and fractalously propagating into quite
>diverse routes, so it is beyond designability how a later situation will
>look.


Yes.

Bruno

Alastair Malcolm

unread,
Oct 22, 2001, 10:41:43 AM10/22/01
to everyth...@eskimo.com
Juergen wrote (on 12th Oct):

> . . . In most possible futures your computer will


> vanish within the next second. But it does not. This indicates that our
> future is _not_ sampled from a uniform prior.

I don't wish to comment directly on the computer-vanishing problem as it
applies to Juergen's scheme (my own problem with this laudable scheme is
that it appears to be vulnerable to the same 'turtle-itis' criticism as for
all theistic religions - the (literal or abstract GP) 'turtle' responsible
for the world seems to need another turtle to support it and so on - there
is no full explanation), but I would like to say that certain other proposed
solutions don't suffer from this computer-vanishing problem (also known as
the WR/dragon problem), if one thinks of infinite length bit strings /
formal system descriptions via Limit n -> infinity, where n is the relevant
string/description length (see appendix below). It seems to me that in
thinking in simple infinity terms one can lose essential information (for
example integers cannot be determined to be more numerous than the odd
numbers - both are of the lowest order of infinity - not a problem for limit
analysis).

Alastair Malcolm

APPENDIX

One might naively think that as there are at least hundreds of possible
states (call them V1, V2... ) where some part of our computer clearly
vanishes (and only one (N), where normality prevails), then even if one
considers bit string or other types of formal description involving other
variations in our universe or indeed other universes, one could still
'divide through' to find that we are most likely to be in a universe where
our computer vanishes in whole or part.

However, we note that in any minimal description of our universe (and the
following argument does not depend on there having to *only* be a minimal
description), deviations from the actual physical laws causing V1, V2...
will involve additional ad-hoc rules/events, so we can say that in
complexity terms (whether as a bit string or formal system description) we
will have V1 = N + VE1, V2 = N + VE2 etc, where VE1, VE2 etc are the extra
segments of description required to cater for the part-vanishings (strictly:
Length(V1) = Length(N) + Length(VE1) etc, but hopefully this is clear).

Moreover, if we are considering all possible descriptions, we also have to
allow for extra descriptions corresponding to entities beyond our
observability. For each V = N + VE we will have many DC = N + DCE, where DC
is a 'don't care' description - it is a universe (or set of universes)
indistinguishable by us from N (our own, non-computer-vanishing one), yet
objectively different.

Now, the key point is that in any objective descriptive framework (whether
by bit string or formal system), one should take the Limit as the
description length increases to infinity, not as our (humanly biassed)
visible universe is progressively and unobservably added to (say by other
universes). As we do this, we are far more likely to be in a DC (= N + DCE)
universe than a V (= N + VE) universe: computers don't normally vanish, in
whole or in part.

More details at:
http://www.physica.freeserve.co.uk/p105.htm
linking to:
http://www.physica.freeserve.co.uk/p111.htm
http://www.physica.freeserve.co.uk/p112.htm

jue...@idsia.ch

unread,
Oct 22, 2001, 11:50:05 AM10/22/01
to everyth...@eskimo.com

> From R.Sta...@unsw.edu.au:


> jue...@idsia.ch wrote:
> > M measure:
> > M(empty string)=1
> > M(x) = M(x0)+M(x1) nonnegative for all finite x.
>
> This sounds more like a probability distribution than a measure. In
> the set of all descriptions, we only consider infinite length
> bitstrings. Finite length bitstrings are not members. However, we can
> identify finite length bitstrings with subsets of descriptions. The
> empty string corresponds to the full set of all descriptions, so the
> first line M(empty string)=1 implies that the measure is normalisable
> (ie a probability distribution).


Please check out definitions of measure and distribution!
Normalisability is not the critical issue.

Clearly: Sum_x M(x) is infinite. So M is not a probability
distribution. M(x) is just measure of all strings starting with x:
M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = ....

Neglecting finite universes means loss of generality though.
Hence measures mu(x) in the ATOE paper do not neglect finite x:

mu(empty string)=1
mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative).

And here P is a probability distribution indeed!
P(x)>0 possible only for x with finite description.

Russell Standish

unread,
Oct 22, 2001, 7:26:05 PM10/22/01
to jue...@idsia.ch, everyth...@eskimo.com
jue...@idsia.ch wrote:
>
>
>
> > From R.Sta...@unsw.edu.au:
> > jue...@idsia.ch wrote:
> > > M measure:
> > > M(empty string)=1
> > > M(x) = M(x0)+M(x1) nonnegative for all finite x.
> >
> > This sounds more like a probability distribution than a measure. In
> > the set of all descriptions, we only consider infinite length
> > bitstrings. Finite length bitstrings are not members. However, we can
> > identify finite length bitstrings with subsets of descriptions. The
> > empty string corresponds to the full set of all descriptions, so the
> > first line M(empty string)=1 implies that the measure is normalisable
> > (ie a probability distribution).
>
>
> Please check out definitions of measure and distribution!
> Normalisability is not the critical issue.
>
> Clearly: Sum_x M(x) is infinite. So M is not a probability
> distribution. M(x) is just measure of all strings starting with x:
> M(x) = M(x0)+M(x1) = M(x00)+M(x01)+M(x10)+M(x11) = ....

For a measure to be normalisable, the sum over a *disjoint* set of
subsets must be finite. If the set of subsets is not disjoint (ie the
intersection are not empty) then the sum may well be infinite.

Bringing this to the case of finite strings. Each finite string is
actually a subset of the set of infinite strings, each containing the
same finite prefix. So the string 101010 is actually a subset of 10101
and so on. The Sum_x M(x), where I assume x ranges over all strings
will of course be infinite. However, since the set of finite strings
is not disjoint, this doesn't imply M(x) is unnormalisable.

Now when you realise that every finite string x is a subset of the empty
string, it becomes clear that M(x) is normalised to precisely 1.

>
> Neglecting finite universes means loss of generality though.
> Hence measures mu(x) in the ATOE paper do not neglect finite x:
>
> mu(empty string)=1
> mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative).
>
> And here P is a probability distribution indeed!
> P(x)>0 possible only for x with finite description.
>

The P(x)>0 case above actually breaks the countably subadditive
property, so mu(x) cannot be called a measure... I'm not entirely sure
what you're getting at here.

Cheers

----------------------------------------------------------------------------

jue...@idsia.ch

unread,
Oct 23, 2001, 11:52:37 AM10/23/01
to everyth...@eskimo.com
> From: R.Sta...@unsw.edu.au:

The point is: prob dists and measures are different things. There is a
good reason for giving them different names. Prob dists assign numbers
to individual objects, not to sets. Traditional definitions as well as
those for semimeasures in http://www.idsia.ch/~juergen/toesv2/

> > Neglecting finite universes means loss of generality though.
> > Hence measures mu(x) in the ATOE paper do not neglect finite x:
> >
> > mu(empty string)=1
> > mu(x) = P(x)+mu(x0)+mu(x1) (all nonnegative).
> >
> > And here P is a probability distribution indeed!
> > P(x)>0 possible only for x with finite description.
> >
>
> The P(x)>0 case above actually breaks the countably subadditive
> property, so mu(x) cannot be called a measure... I'm not entirely sure
> what you're getting at here.

The set of computable universes includes the finite ones which we may
not ignore. How do they contribute to the measure of all universes?
For convenience introduce a third symbol $ besides 0 and 1. Now what is
the measure of the set of all universes starting with 00110? It's the sum of
the measure of the set of universes starting with 001101 and
the measure of the set of universes starting with 001100 and
the measure of the single universe 00110$ that stops right after 00110.
Once there is a $ symbol there won't be another 0 or 1.
So mu(x) = P(x)+mu(x0)+mu(x1).

Our favorite example is the universal Turing machine with random inputs.
For finite x: P(x) is the probability that the TM output is x and nothing
but x. mu(x) is something different, namely, the so-called universal measure
of all strings _starting_ with x. Again mu(x) = P(x)+mu(x0)+mu(x1).

Juergen Schmidhuber http://www.idsia.ch/~juergen/

jue...@idsia.ch

unread,
Oct 25, 2001, 3:51:48 AM10/25/01
to everyth...@eskimo.com
> From: Russell Standish <R.Sta...@unsw.edu.au>
> To: jue...@idsia.ch
>
> I think we got into this mess debating whether an infinite set could
> support a uniform measure. I believe I have demonstrated this.
> I've yet to see anything that disabuses me of the notion that a
> probability distribtuion is simply a measure that has been normalised
> to 1. Not all measures are even normalisable.

Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
uniform probability distribution. Why were measures invented in the first
place? To deal with infinite sets. You cannot have a uniform probability
distribution on infinitely many things. That's why measures isolate just
finitely many things, say, every bitstring of size n, and for each x of
size n look at the infinite set of strings starting with x. A uniform
measure assigns equal probability to each such set. Of course, then you
have a uniform probability distribution on those finitely many things
which are sets. But that's not a uniform probability distribution on
infinitely many things, e.g., on the bitstrings themselves! The measure
above is _not_ a probability distribution; it is an infinite _set_ of
_finite_ probability distributions, one for string size 0, one for string
size 1, one for string size 2,...

> I realise that the Halting theorem gives problems for believers of
> computationalism.

It does not. Why should it?

> I never subscribed to computationalism at any time,
> but at this stage do not reject it. I could conceive of us living in
> a stupendous virtual reality system, which is in effect what your GP
> religion Mark II is. However, as pointed out by others, it does suffer
> from "turtle-itis", and should not be considered the null
> hypothesis. It requires evidence for belief.

By turtle-itis you mean: in which universe do the GP and his computer
reside? Or the higher-level GP2 which programmed GP? And so on? But
we cannot get rid of this type of circularity - computability and
mathematical logic are simply taken as given things, without any
further explanation, like a religion. The computable multiverse, or
the set of logically possible or mathematically possible or computable
universes, represents the simplest explanation we can write down formally.
But what exactly does it mean to accept something as a formal statement?
What does it mean to identify the messy retina patterns caused by this
text with abstract mathematical symbols such as x and y? All formal
explanations in our formal papers assume that we agree on how to read
them. But reading and understanding papers is a complex physical and
cognitive process. So all our formal explanations are relative to this
given process which we usually do not even question. Essentially, the
GP program is the simplest thing we can write down, relative to the
unspoken assumption that it is clear what it means to write something
down, and how to process it. It's the simplest thing, given this use
of mathematical language we have agreed upon. But here the power of the
formal approach ends - unspeakable things remain unspoken.

Juho Pennanen

unread,
Oct 25, 2001, 8:35:00 AM10/25/01
to everyth...@eskimo.com

juergen wrote:

> Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
> uniform probability distribution. Why were measures invented in the first
> place? To deal with infinite sets. You cannot have a uniform probability
> distribution on infinitely many things.


The last sentence is trivially true when 'probability distributions' are
defined as probability measures on discrete sets. But someone could also
think this is just irrelevant word-play on definitions.

There are uniform probability (density) functions on infinite sets, e.g.
the uniform distribution on the interval [0,1], which gives the measure
(probability) t for each interval [x,x+t]. Obviously, this distribution
gives measure 0 for any singleton containing only one real number. Still
it's uniform, though not a 'probability distribution' if one wishes to
use the most restricted definition.

Similarly, there is the natural probability measure on the set of all
infinite strings of 0's and 1's. It is 'uniform' in the sense that no
string or place in a string is in a priviledged position. To define it,
set n_0 as the set of all strings that have a 0 at the n'th place and
n_1 as the set of all strings that have a 1 at the n'th place, for any
n. Then set m(n_0)=m(n_1)=1/2, for all n. Using the standard definition
of measure that has been cited on this list (Kolmogorov axioms), m has a
unique extension which is a measure on the set of all strings.

So there may be no 'uniform probability distribution' on the set of all
strings, but there is the natural probability measure, that is in many
cases exactly as useful.

Juho

Russell Standish

unread,
Oct 25, 2001, 7:12:02 PM10/25/01
to jue...@idsia.ch, everyth...@eskimo.com
jue...@idsia.ch wrote:
>
> > From: Russell Standish <R.Sta...@unsw.edu.au>
> > To: jue...@idsia.ch
> >
> > I think we got into this mess debating whether an infinite set could
> > support a uniform measure. I believe I have demonstrated this.
> > I've yet to see anything that disabuses me of the notion that a
> > probability distribtuion is simply a measure that has been normalised
> > to 1. Not all measures are even normalisable.
>
> Russell, at the risk of beating a dead horse: a uniform measure is _not_ a
> uniform probability distribution. Why were measures invented in the first
> place? To deal with infinite sets. You cannot have a uniform probability
> distribution on infinitely many things. That's why measures isolate just
> finitely many things, say, every bitstring of size n, and for each x of
> size n look at the infinite set of strings starting with x. A uniform
> measure assigns equal probability to each such set. Of course, then you
> have a uniform probability distribution on those finitely many things
> which are sets. But that's not a uniform probability distribution on
> infinitely many things, e.g., on the bitstrings themselves! The measure
> above is _not_ a probability distribution; it is an infinite _set_ of
> _finite_ probability distributions, one for string size 0, one for string
> size 1, one for string size 2,...
>

Good grief, you've missed the point entirely! Measures do measure
infinite sets - consider the set [0,1] - it is an infinite set, and
the uniform probaility distribution and uniform measure are the same
thing. We can also consider the set [0,\infty) which has a uniform
measure on it (the same constant probability distibution as before,
but extended to the full set). Clearly it cannot have a uniform
probability distribution, as the set has infinite measure. You seem
fixated on the peculiar definition of measure on strings that you gave
before, rather than the more general notion.


> > I realise that the Halting theorem gives problems for believers of
> > computationalism.
>
> It does not. Why should it?

Restating what I said before simply:

1) Halting theorem implies the set of halting programs is of measure
zero within the set of all programs

2) Computationalism is the assumption that the universe's observer is
a universal Turing machine.

3) Uniform measure over the Plenitude of descriptions implies that
observer will never see simple strings.

The GP program (aka Universal Dovetailer) is not the simplest
thing. The simplest describable thing is the set of all descriptions,
that simply exist without being generated. That has precisely zero
information or complexity, whereas the UD has some complexity of the
order of a few 100 bits. The simplest prior is the uniform one, ie
there is no bias whatsoever in the selection.

The only reason for not accepting the simplest thing is if it can be
shown to be logically inconsistent. This far, you have shown no such
thing, but rather demonstrated an enormous confusion between measure
and probability distribution.

The set of all descriptions and the uniform measure do not suffer from
turtle-itis. However, they are susceptible to criticism of the
Church-Turing thesis - namely why should our idea of information in
terms of strings of symbols from a discrete alphabet have universal
applicability. Yet, I don't see anyone here seriously arguing against the
CT thesis, so there I will let it lie.

----------------------------------------------------------------------------

Marchal

unread,
Oct 26, 2001, 9:40:45 AM10/26/01
to everyth...@eskimo.com, Fabric-o...@yahoogroups.metalab.unc.edu
Juergen Schmidhuber wrote


> Russell Standish wrote:
>> I never subscribed to computationalism at any time,
>> but at this stage do not reject it. I could conceive of us living in
>> a stupendous virtual reality system, which is in effect what your GP
>> religion Mark II is. However, as pointed out by others, it does suffer
>> from "turtle-itis", and should not be considered the null
>> hypothesis. It requires evidence for belief.
>
>By turtle-itis you mean: in which universe do the GP and his computer
>reside? Or the higher-level GP2 which programmed GP? And so on? But
>we cannot get rid of this type of circularity - computability and
>mathematical logic are simply taken as given things, without any
>further explanation, like a religion.


I disagree. Comp could not have any pretension with respect to the search
of a TOE without Church thesis. And we can argue for Church Thesis.
(The two main and very different types of argument are:
a) The empirical fact that all definitions of computable functions
leads
provably to the same class of functions.
b) The closure of that set for the "most transcendant" operation in
mathematics: diagonalisation.

It is really the closure of the set of computable functions for the
diagonalisation which save comp from any form of *turtle-itis*.
That closure provides the fixed point for all computable and
self-referencial transformations (from which in particular G and G*
an G* minus G can arise).


>The computable multiverse, or
>the set of logically possible or mathematically possible or computable
>universes, represents the simplest explanation we can write down formally.


Except that I still don't know what you mean by "universe".


>But what exactly does it mean to accept something as a formal statement?
>What does it mean to identify the messy retina patterns caused by this
>text with abstract mathematical symbols such as x and y? All formal
>explanations in our formal papers assume that we agree on how to read
>them.


I disagree. Axiomatic has been invented and developped for
being able to see the validity of our reasoning independently of our
interpretations of it. Of course we bet on the ability to distinguish
things, and O/1 in particular, but that's another level.


>But reading and understanding papers is a complex physical and
>cognitive process. So all our formal explanations are relative to this
>given process which we usually do not even question.


You are mixing levels of description. It is not because I take for
granted that you can read english, that I will take for granted that
you have a brain made of wave/particle or *any* physical description.

This confirm your *use* of the GP suffers turtle-itis. I should have
ask you what you mean by universe much earlier. My feeling is you are
presupposing some conception of the universe. The UDA gives at least
a road making the reasoning immune with respect to such preconception
(with the exception of the preconception of natural numbers).


>Essentially, the
>GP program is the simplest thing we can write down, relative to the
>unspoken assumption that it is clear what it means to write something
>down, and how to process it.


The miracle of Church Thesis is that the "how to process it" is
defined in the arithmetical relation themselves. That is what make
possible to study the possible discourse of the machines themselves,
about they most probable relative computation states and stories.
I guess we agree on the importance of intuition about the finitary,
the finite things.


>It's the simplest thing, given this use
>of mathematical language we have agreed upon. But here the power of the
>formal approach ends - unspeakable things remain unspoken.


I disagree. I would even say that it is here that the serious formal
approach begins. Take "unprovable" for "unspeakable", we can
meta-reason (informally or formally) and study the logical structure
of the set of unprovable sentences by sound universal machines.
Although unprovable by the UTM, those sentences are anticipable.
Note that "there is a universe around me" is probably such a
(psycho) logical anticipable sentence, not a provable one.

Bruno

Juergen Schmidhuber

unread,
Oct 26, 2001, 9:45:52 AM10/26/01
to everyth...@eskimo.com
> From: Juho Pennanen <juho.p...@helsinki.fi>

> So there may be no 'uniform probability distribution' on the set of all
> strings, but there is the natural probability measure, that is in many
> cases exactly as useful.

Sure, I agree, measures are useful; I'm using them all the time. But in
general they are _not_ the same thing as probability distributions.

> From: Russell Standish <R.Sta...@unsw.edu.au>
> The only reason for not accepting the simplest thing is if it can be
> shown to be logically inconsistent. This far, you have shown no such
> thing, but rather demonstrated an enormous confusion between measure
> and probability distribution.

Russell, this is really too much; please read any intro to measure
theory,
e.g., the one by Halmos. Measures in general are _not_ probability
distributions. They are more than that; you can view them as _sets_ of
probability distributions on sets that are related to each other in a
specific way. Probability density functions define measures and
therefore
in general aren't probability distributions either. The uniform PDF on
[0..1] yields the perfect trivial example; that's exactly why I used it
before. In the computability context, compare LI/Vitanyi 1997, chapters
4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
(continuous sample space with (semi)measure M(x)).

> 1) Halting theorem implies the set of halting programs is of measure
> zero within the set of all programs

You mean, given a uniform measure? But why should you need the halting
theorem for this trivial statement? In fact, what exactly do you mean
by
halting theorem? (Schoenfield's book provides a good intro to
mathematical
logic and computability theory.)

> The GP program (aka Universal Dovetailer) is not the simplest
> thing. The simplest describable thing is the set of all descriptions,
> that simply exist without being generated. That has precisely zero
> information or complexity, whereas the UD has some complexity of the
> order of a few 100 bits. The simplest prior is the uniform one, ie
> there is no bias whatsoever in the selection.

Exist without being generated? Precisely zero information or
complexity? But with respect to what? Any traditional definition of
information/simplicity requires the specification of a formal
description
method, such as a Turing machine. You don't need one? Then how do you
define information or complexity? How exactly do you define "simple" ?

Juergen Schmidhuber

unread,
Oct 26, 2001, 10:42:40 AM10/26/01
to everyth...@eskimo.com, everyth...@eskimo.com
Schmidhuber:

>>It's the simplest thing, given this use of mathematical
>>language we have agreed upon. But here the power of the
>>formal approach ends - unspeakable things remain unspoken.

Marchal:

>I disagree. I would even say that it is here that the serious formal
>approach begins. Take "unprovable" for "unspeakable", we can
>meta-reason (informally or formally) and study the logical structure
>of the set of unprovable sentences by sound universal machines.

Schmidhuber:

Unprovable does not mean unspeakable!
You can spell out many unprovable things.

Why care for the subset of provable sentences?
Aren't we interested in the full set of all describable sentences?
We can generate it, without caring for proofs at all.

"Unspeakable" means "beyond formal definition." In particular, the
way we humans communicate and agree on a common formal language is not
formally defined. You write "x" and I say, ok, it's an "x", but that's
all based on some sort of informal agreement that involves complex
pattern recognition and learning processes. Usually we do not question
the validity of outcomes of such cognitive processes, and just go ahead
with formal derivations. But there is this major informal step *before*
formal reasoning can start, and good textbooks on logic acknowledge
this.

Russell Standish

unread,
Oct 28, 2001, 6:29:50 PM10/28/01
to Juergen Schmidhuber, everyth...@eskimo.com
Juergen Schmidhuber wrote:
> > From: Russell Standish <R.Sta...@unsw.edu.au>
> > The only reason for not accepting the simplest thing is if it can be
> > shown to be logically inconsistent. This far, you have shown no such
> > thing, but rather demonstrated an enormous confusion between measure
> > and probability distribution.
>
> Russell, this is really too much; please read any intro to measure
> theory,
> e.g., the one by Halmos. Measures in general are _not_ probability
> distributions. They are more than that; you can view them as _sets_ of
> probability distributions on sets that are related to each other in a
> specific way. Probability density functions define measures and
> therefore
> in general aren't probability distributions either. The uniform PDF on
> [0..1] yields the perfect trivial example; that's exactly why I used it
> before. In the computability context, compare LI/Vitanyi 1997, chapters
> 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> (continuous sample space with (semi)measure M(x)).

This is an excellent reference, thank you. Example 4.2.1 gives the
construction of the uniform measure over the set of strings, precisely
what you claim didn't exist when we started this discussion.


>
> > 1) Halting theorem implies the set of halting programs is of measure
> > zero within the set of all programs
>
> You mean, given a uniform measure? But why should you need the halting
> theorem for this trivial statement? In fact, what exactly do you mean
> by
> halting theorem? (Schoenfield's book provides a good intro to
> mathematical
> logic and computability theory.)
>

The halting theorem is that there is no algorithm for predicting
whether any program will halt. Of course, the result I was referring
to is that the set of compressible strings is of measure zero. The set
of halting programs actually has finite measure, since \Omega >
0. However, there is still finite probability of an input string
being nonhalting, and indeed the probability seems to be greater than
0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
presents a problem for computationalism.

> > The GP program (aka Universal Dovetailer) is not the simplest
> > thing. The simplest describable thing is the set of all descriptions,
> > that simply exist without being generated. That has precisely zero
> > information or complexity, whereas the UD has some complexity of the
> > order of a few 100 bits. The simplest prior is the uniform one, ie
> > there is no bias whatsoever in the selection.
>
> Exist without being generated? Precisely zero information or
> complexity? But with respect to what?

With respect to the observer. With the respect to anything in
fact. The set of all possible descriptions has precisely zero
information, by definition, regardless of what observer or context you
employ.

Any traditional definition of
> information/simplicity requires the specification of a formal
> description
> method, such as a Turing machine. You don't need one? Then how do you
> define information or complexity? How exactly do you define "simple" ?
>

Actually, all it needs is an equivalencing procedure. See my "On
Complexity and Emergence" paper. A Turing machine will do the job for
you, but so will a neural network or an "expert system" following
heuristic rules.

Perhaps the simplest formal approach is say that the equivalencing procedure
is a function from the set of descriptions onto the integers
(labelling the equivalence classes). If that function is partial
recursive, then the equivalencing mechanism can be replaced by the
appropriate Turing machine. Computationalism is the creed that this
function must be partial recursive, but it is not necessary for the
definition of information or complexity.

I can see some subtle problems occurring if this equivalencing
procedure is stochastic in nature, however by summing over the
appropriate distributions, one should still end up with meaningful
results provided the variance is not too large.


Cheers

Marchal

unread,
Oct 29, 2001, 5:15:50 AM10/29/01
to everyth...@eskimo.com
Schmidhuber wrote:

>Why care for the subset of provable sentences?

>Aren't we interested in the full set of all describable sentences?


We are interested in the true sentences. The provable one and the
unprovable one.


>We can generate it, without caring for proofs at all.


If you mean generate all describable *sentences* , this is trivial
but then you generate only a language.

If you mean generate all computations, it is equivalent with generating
the proofs of all \Sigma_1 sentences (= the "arithmetical" UD).
The provability predicate is just a universal \Sigma_1 sentence.

(I recall a \Sigma_1 sentence is a sentence equivalent to (Ex)P(x) with
P algorithmicaly decidable).


>"Unspeakable" means "beyond formal definition."


Beyond turing nameability?
Beyond formal definition by whom? You know the universal sound
machine has no mean to define its own truth predicate (Tarski). Is it
in that sense? The knowledge of p by the machine can be define (in a first
approximation) by "p & Bew('p)", in which case, although thoroughly
clear at the metalevel, the knowledge *by* the machine is beyound
any formal definition *for* the machine.

Bew = Godel provability predicate, 'p = the godel number of "p".
Bew(x) = (Ey)B(y,x) with B(y,x) means y is the godel number of a proof
of a sentence with godel number x.


>In particular, the
>way we humans communicate and agree on a common formal language is not
>formally defined.


OK. Most importantly the very meaning of "finite" or number, cannot
be formally defined. But I show that as far as we are sound universal
machine, or own knowledgeability "predicate" cannot be formally defined
by us.


>You write "x" and I say, ok, it's an "x", but that's
>all based on some sort of informal agreement that involves complex
>pattern recognition and learning processes. Usually we do not question
>the validity of outcomes of such cognitive processes, and just go ahead
>with formal derivations.


Yes, but the situation is worst for the very *meaning* of our
utterances.


>But there is this major informal step *before*
>formal reasoning can start, and good textbooks on logic acknowledge
>this.


There is *this* one, but there is a deeper one with the *meaning*
of our formal presentations. We really bet we share a common intuition
on the notion of finitary procedure.
Note that intuitionist and classicist diverge beyond natural numbers.

We are not on the same lenghtwave Juergen. You believe there
is a computable universe. I am agnostic about that. But if I survive
or (just experience nothing) through a digital functionnal substitution
made at some level L then my experience is defined by the set of all
consistent extensions defined by all consistent histories definissable
below L. With comp realities emerge from the way consistent machine's
memories organise themselves (through the many many histories going
through their states).

Your speed prior would be useful if it shows how it enhances
the weight of normal stories in the limit first person experiences
(due to unawareness of the reconstitution delays).
Note that quantum quantization of classical theories provide such
an explanation. I show comp need to extract that quantization from
a measure on the consistent extensions. I show the "probability one"
obeys the main axioms of quantum logic.

You ask me often why I give so much importance to the notion
of provability. The reason is that in some sense I just interview
scientific machines, as they can appear in the GP's work, about what
they are able to prove about their own consistent extensions, and how
and why they can arrive to probability consisderation. And "provability"
is enough non trivial for providing clues on the minimal necessary
logical structures on that set of consistent extensions.


Bruno

Juergen Schmidhuber

unread,
Oct 29, 2001, 7:09:36 AM10/29/01
to everyth...@eskimo.com, everyth...@eskimo.com
> > > From: Russell Standish <R.Sta...@unsw.edu.au>
> > > The only reason for not accepting the simplest thing is if it can be
> > > shown to be logically inconsistent. This far, you have shown no such
> > > thing, but rather demonstrated an enormous confusion between measure
> > > and probability distribution.
> >
> > Russell, this is really too much; please read any intro to measure
> > theory,
> > e.g., the one by Halmos. Measures in general are _not_ probability
> > distributions. They are more than that; you can view them as _sets_ of
> > probability distributions on sets that are related to each other in a
> > specific way. Probability density functions define measures and
> > therefore
> > in general aren't probability distributions either. The uniform PDF on
> > [0..1] yields the perfect trivial example; that's exactly why I used it
> > before. In the computability context, compare LI/Vitanyi 1997, chapters
> > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> > (continuous sample space with (semi)measure M(x)).
>
> This is an excellent reference, thank you. Example 4.2.1 gives the
> construction of the uniform measure over the set of strings, precisely
> what you claim didn't exist when we started this discussion.

You just want to annoy me, don't you? For the last time: There is a
uniform *measure* on the strings, but no uniform probability
distribution.

> > > 1) Halting theorem implies the set of halting programs is of measure
> > > zero within the set of all programs
> >
> > You mean, given a uniform measure? But why should you need the halting
> > theorem for this trivial statement? In fact, what exactly do you mean
> > by
> > halting theorem? (Schoenfield's book provides a good intro to
> > mathematical
> > logic and computability theory.)
> >
>
> The halting theorem is that there is no algorithm for predicting
> whether any program will halt. Of course, the result I was referring
> to is that the set of compressible strings is of measure zero. The set
> of halting programs actually has finite measure, since \Omega >
> 0. However, there is still finite probability of an input string
> being nonhalting, and indeed the probability seems to be greater than
> 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
> presents a problem for computationalism.

This is mixing up everything, in particular, different measures.
1) Infinite strings with finite program have *nonzero* measure, given
the *universal* measure (Omega is a number derived from the universal
measure). 2) They have measure zero, given the *uniform* measure (Omega
has nothing to do with it). 3) To see 2) we do not need the halting
theorem - we just compare the finite programs (countably many) to all
strings (uncountably many) and we are done. 4) The halting theorem is
no problem whatsoever when it comes to computable universes. Why care
whether some computable universe stops or not? If it does, it does,
otherwise it does not. In both cases we can compute it in the limit.

> > > The GP program (aka Universal Dovetailer) is not the simplest
> > > thing. The simplest describable thing is the set of all descriptions,
> > > that simply exist without being generated. That has precisely zero
> > > information or complexity, whereas the UD has some complexity of the
> > > order of a few 100 bits. The simplest prior is the uniform one, ie
> > > there is no bias whatsoever in the selection.
> >
> > Exist without being generated? Precisely zero information or
> > complexity? But with respect to what?
>
> With respect to the observer. With the respect to anything in
> fact. The set of all possible descriptions has precisely zero
> information, by definition, regardless of what observer or context you
> employ.

This is a vague informal statement, not a definition. Which are the
possible descriptions? What's the formal difference between a
description
and a non-description? What is your "anything" if there is no device
for describing it formally? If "anything" does not convey any bit of
info then what exactly is it that conveys one bit? Or two?

> Any traditional definition of
> > information/simplicity requires the specification of a formal
> > description
> > method, such as a Turing machine. You don't need one? Then how do you
> > define information or complexity? How exactly do you define "simple" ?
>
> Actually, all it needs is an equivalencing procedure. See my "On
> Complexity and Emergence" paper. A Turing machine will do the job for
> you, but so will a neural network or an "expert system" following
> heuristic rules.

So you have to write down formally this equivalencing procedure, right?
Why should this be different in principle from writing down the formal
rules for a Turing machine? Why is a simplicity measure that depends on
your equivalencing procedure superior to a simplicity measure depending
on a Turing machine? (The TM procedure includes yours - because it
includes all procedures - but not the other way round.)

Russell Standish

unread,
Oct 29, 2001, 6:23:38 PM10/29/01
to Juergen Schmidhuber, everyth...@eskimo.com
Juergen Schmidhuber wrote:
>
> > > > From: Russell Standish <R.Sta...@unsw.edu.au>
> > > > The only reason for not accepting the simplest thing is if it can be
> > > > shown to be logically inconsistent. This far, you have shown no such
> > > > thing, but rather demonstrated an enormous confusion between measure
> > > > and probability distribution.
> > >
> > > Russell, this is really too much; please read any intro to measure
> > > theory,
> > > e.g., the one by Halmos. Measures in general are _not_ probability
> > > distributions. They are more than that; you can view them as _sets_ of
> > > probability distributions on sets that are related to each other in a
> > > specific way. Probability density functions define measures and
> > > therefore
> > > in general aren't probability distributions either. The uniform PDF on
> > > [0..1] yields the perfect trivial example; that's exactly why I used it
> > > before. In the computability context, compare LI/Vitanyi 1997, chapters
> > > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> > > (continuous sample space with (semi)measure M(x)).
> >
> > This is an excellent reference, thank you. Example 4.2.1 gives the
> > construction of the uniform measure over the set of strings, precisely
> > what you claim didn't exist when we started this discussion.
>
> You just want to annoy me, don't you? For the last time: There is a
> uniform *measure* on the strings, but no uniform probability
> distribution.

Who is annoying who? I never claimed a uniform probability density (I
actually looked up the difference between probability density and
probability distribution yesterday in Li and Vitanyi) - only a uniform
measure. In fact, my first comment to you was "Why do you insist on it
being a PDF?". Remember?

The simplest prior is the uniform measure on infinite strings.

>
> > > > 1) Halting theorem implies the set of halting programs is of measure
> > > > zero within the set of all programs
> > >
> > > You mean, given a uniform measure? But why should you need the halting
> > > theorem for this trivial statement? In fact, what exactly do you mean
> > > by
> > > halting theorem? (Schoenfield's book provides a good intro to
> > > mathematical
> > > logic and computability theory.)
> > >
> >
> > The halting theorem is that there is no algorithm for predicting
> > whether any program will halt. Of course, the result I was referring
> > to is that the set of compressible strings is of measure zero. The set
> > of halting programs actually has finite measure, since \Omega >
> > 0. However, there is still finite probability of an input string
> > being nonhalting, and indeed the probability seems to be greater than
> > 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
> > presents a problem for computationalism.
>
> This is mixing up everything, in particular, different measures.
> 1) Infinite strings with finite program have *nonzero* measure, given
> the *universal* measure (Omega is a number derived from the universal
> measure). 2) They have measure zero, given the *uniform* measure (Omega
> has nothing to do with it).

The uniform measure and the universal measure are related. The uniform
prior is on the set of input strings to a UTM, the universal measure
is the induced measure on the output strings.

No. 1) here is the relevant statement, not 2).

3) To see 2) we do not need the halting
> theorem - we just compare the finite programs (countably many) to all
> strings (uncountably many) and we are done. 4) The halting theorem is
> no problem whatsoever when it comes to computable universes. Why care
> whether some computable universe stops or not? If it does, it does,
> otherwise it does not. In both cases we can compute it in the limit.
>
> > > > The GP program (aka Universal Dovetailer) is not the simplest
> > > > thing. The simplest describable thing is the set of all descriptions,
> > > > that simply exist without being generated. That has precisely zero
> > > > information or complexity, whereas the UD has some complexity of the
> > > > order of a few 100 bits. The simplest prior is the uniform one, ie
> > > > there is no bias whatsoever in the selection.
> > >
> > > Exist without being generated? Precisely zero information or
> > > complexity? But with respect to what?
> >
> > With respect to the observer. With the respect to anything in
> > fact. The set of all possible descriptions has precisely zero
> > information, by definition, regardless of what observer or context you
> > employ.
>
> This is a vague informal statement, not a definition. Which are the
> possible descriptions? What's the formal difference between a
> description
> and a non-description? What is your "anything" if there is no device
> for describing it formally? If "anything" does not convey any bit of
> info then what exactly is it that conveys one bit? Or two?
>

Choose an alphabet (a finite set of symbols). The set of all
descriptions is the set of all infinite length strings constructed
from that alphabet. I would suggest the notation A*, where A is the
alphabet, but I think this usually refers to the set of finite length
strings.

Now to measure the information content of any particular string, you
need to define an interpreter (or observer) which is simply a map o(x)
from the set of all descriptions to a set of equivalence classes
(which can be identified as a subset of the whole numbers N).

If that map is partially recursive, then the interpreter can be
replaced by a Turing machine. But the map need not be partially
recursive, and I believe it is possible to do this with stochastic
maps (ie a map with an additional input connected to a random oracle).

The information of a string s is simply given by:

I(s) = -log m(o^{-1}(o(s)))/m(Omega)

where Omega is the set of all descriptions, m the uniform measure, and
o^{-1}(o(s)) is the equivalence class equivalent to s. The ratio above
exists, in appropriate limits, even if the value m(Omega) is infinite.

Nevertheless, regardless of what map is chosen, the set of all
descriptions has zero information content, as log 1 = 0.

> > Any traditional definition of
> > > information/simplicity requires the specification of a formal
> > > description
> > > method, such as a Turing machine. You don't need one? Then how do you
> > > define information or complexity? How exactly do you define "simple" ?
> >
> > Actually, all it needs is an equivalencing procedure. See my "On
> > Complexity and Emergence" paper. A Turing machine will do the job for
> > you, but so will a neural network or an "expert system" following
> > heuristic rules.
>
> So you have to write down formally this equivalencing procedure, right?
> Why should this be different in principle from writing down the formal
> rules for a Turing machine? Why is a simplicity measure that depends on
> your equivalencing procedure superior to a simplicity measure depending
> on a Turing machine? (The TM procedure includes yours - because it
> includes all procedures - but not the other way round.)
>

I think I've answered this already.

Marchal

unread,
Oct 30, 2001, 12:10:11 PM10/30/01
to everyth...@eskimo.com
I comment again a relatively old post by Juergen Schmidhuber.


>Juergen: The thing is: when you generate them all, and assume that all are


>equally likely, in the sense that all beginnings of all strings are
>uniformly distributed, then you cannot explain why the regular universes
>keep being regular. The random futures are then just as likely as the
>nonrandom ones.
>So you NEED something additional to explain the ongoing regularity.
>You need something like the Speed Prior, which greatly favors regular
>futures over others.
>

>> Bruno: Take for exemple the iterated duplication experience/experiment.
>> It can be automated by a very simple program. After 64 iterations
>> there will be 1,84467.10^19 agents, and most of them will have
>> an incompressible 01-history. With the comp indeterminism it is
>> much more likely you will get such a really random string
>> (independently of the fact that you will be unable to prove it is
>> really random). Those with computable strings will be exceptional, so
>> that, if those agents work together they will consider (even with
>> Bayes) that the simple self-multiplying algorithm is the simplest
>> and shorter explanation, for those randomeness appearances.
>
>Juergen: But don't you see? Why does a particular agent, say,

>yourself, with a
>nonrandom past, have a nonrandom future?


But we have random future. Just send a sequence of particles in the state
1/sqrt(2)( up + down ) through an "analyser". Write 0 and 1 each time
a particular particle go through. Both QM+comp (Everett) or comp
alone (I will not elaborate !) explain this first person (plural) point
of view randomization by a relative
self-multiplication/division/differentiation.
As I said the simple self-multiplying algorithm is the simplest and
shorter
explanation, for those randomness appearances.


>Why is your computer still there
>after one second, although in a truly random world it would immediately
>dissolve?


No one says reality is *only* a truly random
realm, especially when third person reality is given by UD*, the trace
of the Universal Dovetailer. Arithmetically it is just the set of true
\sigma_1 sentences. That's our (third person) atomic truth. Of course
those truth are "UD" provable.


>Why do pencils keep falling down instead of up, when the
>futures where they fall up are just as likely?


Because those sets of machine accessible arithmetical truth, as viewed
by the machines themselves (this introduce the modalities) is
highly structured.

Indeed the UD has this nasty habit of dovetailing on the reals, so that
some
randomisation is at work. But some equilibrium between randomization
and local lawfullness has to be made in the limit (where first person
point of views are eventually defined in Plato Heaven).
Our own stability could
only rely on the randomization of the details of our stories,
randomisation which we should necessarily observe if we look at ourself
*below* our (apparently common) substitution level. Although
this gives a sort of necessary prior (need of quantisation of the
"classical
stories", transforming H into exp(-iH)), I prefer, following my naive idea
to interview directly the sound Universal Machine.

I define "probability of p = 1" in (Peano) Arithmetic by Bew('p) &
Con('p).
In this way p is true in all consistent extensions (Bew ('p)) and there
is (the bet part) at least some consistent extension (Con('p)).
I restrict the arithmetical interpretation p to the Arithmetical Sigma_1
sentences (the aritmetical UD). This gives an arithmetical
quantization of p by []<>p, (with []p = Bew('p) & Con('p)).
It indeed obeys a sort of quantum logic.
As always Con 'p ( <>p ) = -Bew('-p) ( -[]-p ).


But I fear you don't believe in any form of "strict" indeterminism,
neither
comp nor QM, isn't it?


Bruno

Juergen Schmidhuber

unread,
Oct 31, 2001, 4:49:52 AM10/31/01
to Marchal, everyth...@eskimo.com
Cannibalizing previous thread "Provable vs Computable:"

Which are the logically possible universes? Tegmark mentioned a
somewhat
vaguely defined set of ``self-consistent mathematical structures,''
implying provability of some sort. The postings of Marchal also focus
on what's provable and what's not.

Is provability really relevant? Philosophers and physicists find it
sexy for its Goedelian limits. But what does this have to do with the
set of possible universes?

The provability discussion seems to distract from the real issue. If we
limit ourselves to universes corresponding to traditionally provable
theorems then we will miss out on many formally and constructively
describable universes that are computable in the limit yet in a certain
sense soaked with unprovability.

Example: a never ending universe history h is computed by a finite
nonhalting program p. To simulate randomness and noise etc, p invokes
a short pseudorandom generator subroutine q which also never halts. The
n-th pseudorandom event of history h is based on q's n-th output bit
q(n)
which is initialized by 0 and set to 1 as soon as the n-th statement in
an ordered list of all possible statements of a formal axiomatic system
is proven by a theorem prover that systematically computes all provable
theorems. Whenever q modifies some q(n) that was already used in the
previous computation of h, p appropriately recomputes h since the n-th
pseudorandom event.

Such a virtual reality or universe is perfectly well-defined. We can
program it. At some point each history prefix will remain stable
forever.
Even if we know p and q, however, in general we will never know for sure
whether some q(n) that is still zero won't flip to 1 at some point,
because of Goedel etc. So this universe features lots of unprovable
aspects.

But why should this lack of provability matter? Ignoring this universe
just implies loss of generality. Provability is not the issue.

Marchal

unread,
Nov 1, 2001, 12:59:52 PM11/1/01
to Juergen Schmidhuber, everyth...@eskimo.com, Fabric-o...@yahoogroups.com
Juergen Schmidhuber wrote:


>Which are the logically possible universes? Tegmark mentioned a
>somewhat
>vaguely defined set of ``self-consistent mathematical structures,''
>implying provability of some sort. The postings of Marchal also focus
>on what's provable and what's not.
>
>Is provability really relevant? Philosophers and physicists find it
>sexy for its Goedelian limits. But what does this have to do with the
>set of possible universes?


Because I interpret "possible universe" or "possible model of p" by "p
belongs to a model" or "p belongs to a consistent extension", which I
translate in arithmetic by "p is consistent" that is by "-p is not
provable" (-[]-p). (By Godel's *completeness* theorem it is extensionnaly
equivalent).


>The provability discussion seems to distract from the real issue. If we
>limit ourselves to universes corresponding to traditionally provable
>theorems then we will miss out on many formally and constructively
>describable universes that are computable in the limit yet in a certain
>sense soaked with unprovability.


But *all* our consistent extensions are soaked with unprovability.
And the UDA shows *all* of them must be taken into account.
'Probability one of p' will be when 1) and 2)

1) []p, provable('p'), = p is true in all, if any, "possible universe".
2) <>p consistent('p') = there is a possible universe with p.

(I use Godel *completeness* theorem for first order logic, here, but
a more sophisticated presentation can be made (G and G*
works on second order arithmetic as well, but I don't want to be too
technical)).

So I translate probability one of p by "[]p & <>p". The
incompleteness theorems makes that moves no less trivial than
the translation of "knowing p" by "provable(p) & p" which gives
an arithmetical interpretation of intuitionist logic.

The comp restriction to UD*, the trace of the UD, is translated
in arithmetic by a restriction of p by \Sigma_1 sentences p (this
includes universal \Sigma_1 sentences). Universality (with
comp) is \Sigma_1 completeness (+ oracle(s), etc.).

This gives the quasi Brouwersche modal logics Z1*. (I get
interesting quantum smelling logics also with the translation of
"feeling really p" by "[]p & <>p & p", the X1* logic.
I conjecture Z1* gives an arithmetical quantum logic.
(X1*, Z0*, XO* are interesting variant, the X* minus X
are expected to play the role of a qualia logic, (for the
true mesurable but unprovable atomic propositions).


>Example: a never ending universe history h is computed by a finite
>nonhalting program p. To simulate randomness and noise etc, p invokes
>a short pseudorandom generator subroutine q which also never halts. The
>n-th pseudorandom event of history h is based on q's n-th output bit
>q(n)
>which is initialized by 0 and set to 1 as soon as the n-th statement in
>an ordered list of all possible statements of a formal axiomatic system
>is proven by a theorem prover that systematically computes all provable
>theorems. Whenever q modifies some q(n) that was already used in the
>previous computation of h, p appropriately recomputes h since the n-th
>pseudorandom event.
>
>Such a virtual reality or universe is perfectly well-defined. We can
>program it. At some point each history prefix will remain stable
>forever.
>Even if we know p and q, however, in general we will never know for sure
>whether some q(n) that is still zero won't flip to 1 at some point,
>because of Goedel etc. So this universe features lots of unprovable
>aspects.


Sure.


>But why should this lack of provability matter?...


Because UDA shows that our future is a sort
of mean on the unprovable things (our consistent extensions,
unprovable by incompleteness).

The miracle is the sharing and cohering part of those
extensions. I don't pretend having explained that miracle.
But UDA says where you must look for the explanation, and
the translation of the UDA in arithmetic gives an encouraging step
toward quantum logic, and even quale logics.
(Linking J.S. Bell and J.L. Bell btw, ref in my thesis).


>...Ignoring this universe


>just implies loss of generality. Provability is not the issue.


You are postulating a computable universe. And you believe, like
some physicist that an explanation of that universe is an explanation
for everything (discarding completely the "mind-body" problem).

I am NOT postulating a computable universe. I am not even
postulating a universe at all.
I postulate only *physicists*, in a very large sense
of the word. I postulate that they are (locally) computable.


I postulate only arithmetical truth and its many consistent
internal interpretations.

Provability is not really the issue indeed. It is the border between
provability and unprovability which is the issue.
G* \ G gives insightful informations about that, including the
'probability on the uncomputable' issue.

Although I give theorem provers for the Z logic, question like
what is the status of the Arithmetical Bell's Inequality:

[]<>A & []<>B -> []<>(([]<>A & []<>C) v
([]<>B & []-[]<>C))

remains open. (Here []p = provable(p) and consistent(p), A, B, C
are \Sigma_1 sentences).

Perhaps my discovery (proof of the arithmetical B) is just a
mathematical mirage (although what a coincidence, with respect
to the intuitive UDA!). Perhaps comp, "my" (psycho)logical
comp is wrong. But don't tell me I'm vague, 'cause I'm utterly
precise.

I recall you the crux of the crux (the proof of the modal
symmetry axiom B "p->[]<>p") is given (in english) at
http://www.escribe.com/science/theory/m2855.html


Hoping that helps.


Bruno

PS I have presented the UDA in 11 questions to Joel Dobrzelewski.
Have you follow that issue? At which step would you say "no".

Universal Dovetailer Argument (step by step):
UDA step 1 http://www.escribe.com/science/theory/m2971.html
UDA step 2-6 http://www.escribe.com/science/theory/m2978.html
UDA step 7 8 http://www.escribe.com/science/theory/m2992.html
UDA step 9 10 http://www.escribe.com/science/theory/m2998.html
UDA last question http://www.escribe.com/science/theory/m3005.html
Joel 1-2-3 http://www.escribe.com/science/theory/m3013.html
Re: UDA... http://www.escribe.com/science/theory/m3019.html
George'sigh http://www.escribe.com/science/theory/m3026.html
Re:UDA... http://www.escribe.com/science/theory/m3035.html
Joel's nagging question http://www.escribe.com/science/theory/m3038.html
Re:UDA... http://www.escribe.com/science/theory/m3042.html


http://iridia.ulb.ac.be/~marchal

Wei Dai

unread,
Nov 2, 2001, 7:19:32 PM11/2/01
to Juergen Schmidhuber, everyth...@eskimo.com
On Wed, Oct 31, 2001 at 10:49:41AM +0100, Juergen Schmidhuber wrote:
> Which are the logically possible universes? Tegmark mentioned a
> somewhat
> vaguely defined set of ``self-consistent mathematical structures,''
> implying provability of some sort. The postings of Marchal also focus
> on what's provable and what's not.

Do you understand Marchal's postings? I really have trouble making sense
of them. If you do, would you mind rephrasing his main ideas for me?

> Is provability really relevant? Philosophers and physicists find it
> sexy for its Goedelian limits. But what does this have to do with the
> set of possible universes?

What does it mean for a universe to be "provable"? One interpretation is
that there exists an algorithm to compute any finite time-space region of
the universe that always halts. Is that what you mean?

> The provability discussion seems to distract from the real issue. If we
> limit ourselves to universes corresponding to traditionally provable
> theorems then we will miss out on many formally and constructively
> describable universes that are computable in the limit yet in a certain
> sense soaked with unprovability.
>
> Example: a never ending universe history h is computed by a finite
> nonhalting program p. To simulate randomness and noise etc, p invokes
> a short pseudorandom generator subroutine q which also never halts. The
> n-th pseudorandom event of history h is based on q's n-th output bit
> q(n)
> which is initialized by 0 and set to 1 as soon as the n-th statement in
> an ordered list of all possible statements of a formal axiomatic system
> is proven by a theorem prover that systematically computes all provable
> theorems. Whenever q modifies some q(n) that was already used in the
> previous computation of h, p appropriately recomputes h since the n-th
> pseudorandom event.

Living in this universe is like living in a universe with a
halting-problem oracle, right? If you want to know whether program x
halts, you just measure the n-th pseudorandom event, where n is such that
the n-th theorem is "x halts." Yes?

The problem with universes with halting-problem oracles (which I'll just
call oracle universes) is that either the oracle is not exploitable
(for example if no one in the universe figures out that the pseudorandom
events actually correspond to theorems) in which case the universe just
looks like an ordinary "provable" universe, or the oracle is
exploitable, in which case the outcome becomes completely unimaginable for
us. (Or is that not true? Can we imagine what would happen if an
intelligent species came across an exploitable halting-problem oracle?)

> But why should this lack of provability matter? Ignoring this universe
> just implies loss of generality. Provability is not the issue.

If oracle universes could exist, how significant are they? What's
the probability that we live in such a universe, and how could we try to
determine and/or exploit this fact?

h...@finney.org

unread,
Nov 2, 2001, 7:32:04 PM11/2/01
to jue...@idsia.ch, wei...@eskimo.com, everyth...@eskimo.com
Wei Dai writes:
> Living in this universe is like living in a universe with a
> halting-problem oracle, right? If you want to know whether program x
> halts, you just measure the n-th pseudorandom event, where n is such that
> the n-th theorem is "x halts." Yes?

What about the instantiations of people who measure the event the other
way before the universe gets restarted and re-run from that point? They
would see the wrong answer to the halting question, but they are just
as conscious as the people who later see the right answer.

Hal

Wei Dai

unread,
Nov 2, 2001, 7:58:33 PM11/2/01
to h...@finney.org, jue...@idsia.ch, everyth...@eskimo.com
On Fri, Nov 02, 2001 at 04:29:26PM -0800, h...@finney.org wrote:
> What about the instantiations of people who measure the event the other
> way before the universe gets restarted and re-run from that point? They
> would see the wrong answer to the halting question, but they are just
> as conscious as the people who later see the right answer.

The people who see the wrong answer have zero measure because they exist
for only a finite amount of time on the output tape whereas the people who
see the right answer exist for an infinite amount of time on the output
tape.

Juergen Schmidhuber

unread,
Nov 13, 2001, 6:05:31 AM11/13/01
to everyth...@eskimo.com, everyth...@eskimo.com

Wei Dai wrote:
>
> On Wed, Oct 31, 2001 at 10:49:41AM +0100, Juergen Schmidhuber wrote:
> > Which are the logically possible universes? Tegmark mentioned a
> > somewhat
> > vaguely defined set of ``self-consistent mathematical structures,''
> > implying provability of some sort. The postings of Marchal also focus
> > on what's provable and what's not.
>
> Do you understand Marchal's postings? I really have trouble making sense
> of them. If you do, would you mind rephrasing his main ideas for me?

No, unfortunately I do not understand him. But provability does seem
important to him. Maybe someone else could post a brief and concise
summary?

> > Is provability really relevant? Philosophers and physicists find it
> > sexy for its Goedelian limits. But what does this have to do with the
> > set of possible universes?
>
> What does it mean for a universe to be "provable"? One interpretation is
> that there exists an algorithm to compute any finite time-space region of
> the universe that always halts. Is that what you mean?

Well, any finite object x has a halting algorithm that computes it.
So which are the unprovable things? They are formal statements such as
"algorithm x will eventually compute object y." If it does then you can
prove it, otherwise in general you can never be sure. But why identify
universes with true or false statements? When I'm asking "Is provability
really relevant?" I am not really referring to "provable universes" -
I don't even know what that is supposed to be. I just mean: why should
it be important to show that a particular algorithm computes a
particular
universe? Or that a particular universe has a particular property (there
are many generally unprovable properties)? Let's compute them all -
why worry about provability?

> > The provability discussion seems to distract from the real issue. If we
> > limit ourselves to universes corresponding to traditionally provable
> > theorems then we will miss out on many formally and constructively
> > describable universes that are computable in the limit yet in a certain
> > sense soaked with unprovability.
> >
> > Example: a never ending universe history h is computed by a finite
> > nonhalting program p. To simulate randomness and noise etc, p invokes
> > a short pseudorandom generator subroutine q which also never halts. The
> > n-th pseudorandom event of history h is based on q's n-th output bit
> > q(n)
> > which is initialized by 0 and set to 1 as soon as the n-th statement in
> > an ordered list of all possible statements of a formal axiomatic system
> > is proven by a theorem prover that systematically computes all provable
> > theorems. Whenever q modifies some q(n) that was already used in the
> > previous computation of h, p appropriately recomputes h since the n-th
> > pseudorandom event.
>
> Living in this universe is like living in a universe with a
> halting-problem oracle, right? If you want to know whether program x
> halts, you just measure the n-th pseudorandom event, where n is such that
> the n-th theorem is "x halts." Yes?

Yes. Once the history has stabilized this will give you the correct
answer.

> The problem with universes with halting-problem oracles (which I'll just
> call oracle universes) is that either the oracle is not exploitable
> (for example if no one in the universe figures out that the pseudorandom
> events actually correspond to theorems) in which case the universe just
> looks like an ordinary "provable" universe, or the oracle is
> exploitable, in which case the outcome becomes completely unimaginable for
> us. (Or is that not true? Can we imagine what would happen if an
> intelligent species came across an exploitable halting-problem oracle?)
>
> > But why should this lack of provability matter? Ignoring this universe
> > just implies loss of generality. Provability is not the issue.
>
> If oracle universes could exist, how significant are they? What's
> the probability that we live in such a universe, and how could we try to
> determine and/or exploit this fact?

If the computer on which the history of our universe is calculated is a
general Turing machine (as opposed to a monotone machine which cannot
edit
previous outputs), and if we make no resource assumptions such as those
embodied by the Speed Prior, then the oracle universes indeed represent
a large fraction of the probability mass of all universes. Why? Because
there are many oracle universes with very short algorithms.

How to determine whether we are in an oracle universe? In principle it
is possible, at least to the extent inductive inference is possible at
all! Just search through all oracle pseudo-random generators (PRGs) and
try to match their outputs with the apparently noisy observations. If
the observations are not really random, eventually you'll stumble across
the right PRG. It's just that you cannot prove that it is the right one,
or that its outputs have already stabilized.

What about exploitation? Once you suspect you found the PRG you can use
it
to predict the future. Unfortunately the prediction will take enormous
time to stabilize, and you never can be sure it's finished. So it's not
very practical. I prefer the additional resource assumptions reflected
by the Speed Prior. They make the oracle universes very unlikely, and
yield computable predictions.

Wei Dai

unread,
Nov 14, 2001, 8:47:40 PM11/14/01
to Juergen Schmidhuber, everyth...@eskimo.com
Thanks for clarifying the provability issue. I think I understand and
agree with you.

On Tue, Nov 13, 2001 at 12:05:22PM +0100, Juergen Schmidhuber wrote:
> What about exploitation? Once you suspect you found the PRG you can use
> it
> to predict the future. Unfortunately the prediction will take enormous
> time to stabilize, and you never can be sure it's finished.
> So it's not very practical.

By exploiting the fact that we're in an oracle universe I didn't mean
using TMs to predict the oracle outputs. That is certainly impractical.

There are a couple of things you could do though. One is to use some
oracle outputs to predict other oracle outputs when the relationship
between them is computable. The other, much more important, is to quickly
solve arbitrarily hard computational problem using the oracles.

> I prefer the additional resource assumptions reflected
> by the Speed Prior. They make the oracle universes very unlikely, and
> yield computable predictions.

Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes
are not just very unlikely, they have probability 0, right? Suppose one
day we actually find an oracle for the halting problem, or even just find
out that there is more computing power in our universe than is needed to
explain our intelligence. Would you then (1) give up the Speed Prior and
adopt a more dominant prior, or (2) would you say that you've encountered
an extremely unlikely event (i.e. more likely you're hallucinating)?

If you answer (1) then why not adopt the more dominant prior now?

Juergen Schmidhuber

unread,
Nov 15, 2001, 4:36:01 AM11/15/01
to everyth...@eskimo.com

You are right in the sense that under the Speed Prior any complete
_infinite_ oracle universe has probability 0. On the other hand,
any _finite_ beginning of an oracle universe has nonvanishing
probability. Why? The fastest algorithm for computing all universes
computes _all_ finite beginnings of all universes. Now oracle universes
occasionally require effectively random bits. History beginnings that
require n such bits come out very slowly: the computation of the n-th
oracle bit requires more than O(2^n) steps, even by the fastest
algorithm.
Therefore, under the Speed Prior the probabilities of oracle-based
prefixes quickly converge towards zero with growing prefix size. But in
the finite case there always will remain some tiny epsilon.

Why not adopt a more dominant prior now? I just go for the simplest
explanation consistent with available data, where my simplicity measure
reflects what computer scientists find simple: things that are easy
to compute.

Wei Dai

unread,
Nov 15, 2001, 11:15:43 AM11/15/01
to Juergen Schmidhuber, everyth...@eskimo.com
On Thu, Nov 15, 2001 at 10:35:58AM +0100, Juergen Schmidhuber wrote:
> > Why do you prefer the Speed Prior? Under the Speed Prior, oracle universes
> > are not just very unlikely, they have probability 0, right? Suppose one
> > day we actually find an oracle for the halting problem, or even just find
> > out that there is more computing power in our universe than is needed to
> > explain our intelligence. Would you then (1) give up the Speed Prior and
> > adopt a more dominant prior, or (2) would you say that you've encountered
> > an extremely unlikely event (i.e. more likely you're hallucinating)?
> >
> > If you answer (1) then why not adopt the more dominant prior now?
>
> Why not adopt a more dominant prior now? I just go for the simplest
> explanation consistent with available data, where my simplicity measure
> reflects what computer scientists find simple: things that are easy
> to compute.

You didn't explicitly answer my question about what you would do if you
saw something that is very unlikely under the Speed Prior but likely under
more dominant priors, but it seems that your implicit answer is (1). In
that case you must have some kind of prior (in the Baysian sense) for the
Speed Prior vs. more dominant priors. So where does that prior come from?

Or let me put it this way. What do you think is the probability that the
Great Programmer has no computational resource constraints? If you don't
say the probability is zero, then what you should adopt is a weighted
average of the Speed Prior and a more dominant prior, but that average
itself is a more dominant prior. Do you agree?

Juergen Schmidhuber

unread,
Nov 28, 2001, 11:27:56 AM11/28/01
to everyth...@eskimo.com

If someone can use large scale quantum computing to solve a problem
whose
solution requires much more computation time than necessary to compute
the history of our own particular universe, then one could take this
as evidence against the Speed Prior. You are right: to quantify this
evidence in a Bayesian framework we need a prior on priors.

Which one? Hm. Let me extend your question and ask: what's the
probability
that the Great Programmer is more than a mere programmer in the sense
that he is not bound by the limits of computability? For instance,
if someone were able to show that our universe somehow makes use of an
entire continuum of real numbers we'd be forced to accept some even more
dominant prior that is not even computable in the limit. We could not
even formally specify it.

So what's my prior on all priors? Since the attempt to answer such a
question might lead outside what's formally describable, I'll remain
silent for now.

Wei Dai

unread,
Nov 29, 2001, 2:47:50 PM11/29/01
to Juergen Schmidhuber, everyth...@eskimo.com
On Wed, Nov 28, 2001 at 05:27:43PM +0100, Juergen Schmidhuber wrote:
> Which one? Hm. Let me extend your question and ask: what's the
> probability
> that the Great Programmer is more than a mere programmer in the sense
> that he is not bound by the limits of computability? For instance,
> if someone were able to show that our universe somehow makes use of an
> entire continuum of real numbers we'd be forced to accept some even more
> dominant prior that is not even computable in the limit. We could not
> even formally specify it.

I'm not sure I understand this. Can you give an example of how our
universe might make use of an entire continuum of real numbers? How might
someone show this if it were true?

> So what's my prior on all priors? Since the attempt to answer such a
> question might lead outside what's formally describable, I'll remain
> silent for now.

But if there is a formally describable prior that dominates the speed
prior, and you agree that the more dominant prior doesn't have a prior
probability of zero, then isn't the speed prior redundant? Wouldn't you
get equal posterior probabilities (up to a constant multiple) by
dropping the speed prior from your prior on priors, no matter what it
assigns to priors that are not formally describable?

Juergen Schmidhuber

unread,
Dec 11, 2001, 10:25:01 AM12/11/01
to everyth...@eskimo.com, everyth...@eskimo.com
Wei Dai wrote:
> I'm not sure I understand this. Can you give an example of how our
> universe might make use of an entire continuum of real numbers? How might
> someone show this if it were true?

I have no idea. In fact, I guess it is impossible.

> But if there is a formally describable prior that dominates the speed
> prior, and you agree that the more dominant prior doesn't have a prior
> probability of zero, then isn't the speed prior redundant? Wouldn't you
> get equal posterior probabilities (up to a constant multiple) by
> dropping the speed prior from your prior on priors, no matter what it
> assigns to priors that are not formally describable?

In the Bayesian framework we derive consequences of assumptions
represented as priors. The stronger the assumptions, the more specific
the
predictions. The Speed Prior assumption is stronger than the assumption
of a formally describable prior. It is not redundant because it yields
stronger
predictions such as: The computer computing our universe won't compute
much more of it; large scale quantum computation won't work; etc.

In fact, I do believe the Speed Prior dominates the true prior from
which our universe is sampled (which is all I need to make good
computable predictions), and that the probability of even more
dominant priors is zero indeed. But as a Bayesian I sometimes ignore
my beliefs and also derive consequences of more dominant priors.
I do find them quite interesting, and others who do not share my
belief in the Speed Prior might do so too.

Wei Dai

unread,
Dec 11, 2001, 12:53:20 PM12/11/01
to Juergen Schmidhuber, everyth...@eskimo.com
I don't understand how you can believe that the probability of more
dominant priors is zero. That implies if I offered you a bet of $1
versus your entire net worth that large scale quantum computation will
in fact work, you'd take that bet. Would you really?

Juergen Schmidhuber

unread,
Dec 19, 2001, 3:57:39 AM12/19/01
to everyth...@eskimo.com, everyth...@eskimo.com

Wei Dai wrote:
> I don't understand how you can believe that the probability of more
> dominant priors is zero. That implies if I offered you a bet of $1
> versus your entire net worth that large scale quantum computation will
> in fact work, you'd take that bet. Would you really?

Your dollar against my 2 cents worth? I don't want to rip you off!

Ok, let me adopt a more careful Bayesian's fallback position: I just
assume certain things, then derive consequences, given the assumptions,
without precisely quantifying the plausibility of the assumptions
themselves, letting the reader decide whether they are plausible or not.

Juergen

Reply all
Reply to author
Forward
0 new messages