prior over assertions

Abram Demski

unread,

Sep 21, 2009, 7:03:57 PM9/21/09

to one-...@googlegroups.com

Hi all,

I'm thinking about what a simplicity-prior over logically stated
beliefs might look like. This is my own musings, so I'm not taking
time to explain all the background... feel free to ask questions.

Suppose you generally accept the Solomonoff distribution as the
correct prior, but want to express beliefs in some superset of
first-order logic, rather than in Turing machines (the usual way of
stating it).

My first thought was that the Solomonoff prior could transfer over
easily, since first-order logic can be treated as just another
computationally complete formalism. However, there are several issues.
I've mentioned on a previous occasion the issue of trying to get the
prior to conform to the Kolmogorov axioms... and I'm still not sure
how to resolve that. However, I've thought of a different issue.

The halting problem makes the Solomonoff prior difficult to deal with,
but it leaves it computably approximatable (in the sense that we can
make a computer program that will give us a probability arbitrarily
close to the Solomonoff probability given enough time). However, in
that case we only have to deal with the halting problem. With logical
statements we have that problem too, and another: inconsistency.

Halting problem: A set of axioms may never yield a result one way or
the other for a given statement, and there is no guaranteed
algorithmic way of deciding when this is the case.
Consistency problem: A set of axioms may yield a contradiction, and
there is no guaranteed algorithmic way of deciding that this *isn't*
the case.

Assuming that we take inconsistent theories to be untrue (which
follows from the Kolmogorov axioms, but also from more basic
considerations!) we need to throw them out. With the solomonoff
distribution, we can approach the correct measure from below by adding
probability mass to an outcome whenever we find a Turing machine that
produces that outcome. With a first-order measure, it seems we are
forced to sometimes take away probability mass as well, when a
first-order theory that produced results ends up also producing an
inconsistency.

Fortunately, the prior still converges when we do this! I was
initially afraid that this was not the case; but, since we can only
add and remove mass once, not arbitrarily many times, the system will
be in the correct state eventually with respect to any given theory
(ie, the theory is one that doesn't produce predictions and it in fact
the system hasn't been able to find any from it, or the theory is one
that does produce predictions without leading to inconsistency andthe
system has found them, or the theory is one that produces a
contradiction and it've found it). Taking the N shortest theories,
there will be some time at which they will all be in their proper
state. The error created by longer theories being in an incorrect
state (ie, in fact producing predictions but we've not found them yet,
or in fact producing contradictions but we've not found them yet) is
bound, and gets smaller as we let N get larger.

So, it's computably approximatable (if not efficiently!), like the
Solomonoff distribution. Question: is it equivalent to the Solomonoff
distribution, or not?

--Abram

YKY (Yan King Yin, 甄景贤)

unread,

Sep 23, 2009, 4:48:20 AM9/23/09

to one-...@googlegroups.com

On Tue, Sep 22, 2009 at 7:03 AM, Abram Demski <abram...@gmail.com> wrote:

> Suppose you generally accept the Solomonoff distribution as the
> correct prior, but want to express beliefs in some superset of
> first-order logic, rather than in Turing machines (the usual way of
> stating it).

Sounds like a good idea... =)

But, do you need to define first a logic programming language, like Prolog?

My impression is that a logic, like FOL, is not a computing machine
per se. What does it mean if you say that a bunch of logical
statements computes a function?

> So, it's computably approximatable (if not efficiently!), like the
> Solomonoff distribution. Question: is it equivalent to the Solomonoff
> distribution, or not?

I guess it is. You can use a constant Turing machine to translate
Prolog programs into Turing machines.

YKY

Abram Demski

unread,

Sep 23, 2009, 8:27:36 AM9/23/09

to one-...@googlegroups.com

YKY,

The hidden assumption I'm using that turns first-order logic into a
sort of programming language is that some "fair" proof technique is
being used; ie, every proof-path is eventually followed, because no
bias is strong enough to entirely rule any out. Any procedural
technique that meets that requirement will eventually prove any
provable statement.

One can also distinguish a special class of "output" statements:

out(1,0)
out(2,1)
out(3,1)
out(4,0)
out(5,1)
out(6,0)

In the example shown, we're describing a string of 1s and 0s (as is
typical in abstract treatments of solomonoff induction). But, anything
could be described.

>> So, it's computably approximatable (if not efficiently!), like the
>> Solomonoff distribution. Question: is it equivalent to the Solomonoff
>> distribution, or not?
>
> I guess it is. You can use a constant Turing machine to translate
> Prolog programs into Turing machines.

But how do you translate contradictions?? FOL is like a Turing machine
that has no "halt" command, but does have a "self-destruct" command.
The self-destruct command withdraws all output presented. So, unlike a
Turing machine, we never know if an output is "final"; it could
self-destruct at any point. Fortunately, self-destruct is a final act;
we can't go back and forth arbitrarily between outputting and
not-outputting. That's what allows the prior to still be convergent,
and it might allow for an equivalence with Solomonoff induction... but
I don't know yet.

--Abram

2009/9/23 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>:

--
Abram Demski
http://dragonlogic-ai.blogspot.com/
http://groups.google.com/group/one-logic

Reply all

Reply to author

Forward