> Secondly, it seems to me that the complexity bias only occurs at deme
> level?
Rene, I don't know the internals of PLOP (I'm current working with
MOSES, previous C++ implementation that obviously has a lot in common
with PLOP), I suppose it is only at the deme level (like MOSES before
I started hacking it).
My conclusion about that is that when the data are not noisy then this
is fine, experimenting with MOSES seems to corroborate that.
When the data are faulty then I believe one must introduce a
complexity bias at the scoring level, here are 2 threads I posted that
discuss that:
http://groups.google.com/group/opencog/browse_thread/thread/b7704419e082c6f1?hl=en
http://groups.google.com/group/opencog-developers/browse_thread/thread/a4771ecf63d38df?hl=en
Note that complexity penalty at the deme level can be interpreted the
same way. That is when the probability (or variance) to have faulty
data tends to zero, any tiny difference in the score (supposedly
log(P(M|D))) will be only influenced by log P(D|M) (the non-Occam's
razor part of the score), so it's ok to ignore the complexity bias.
However when candidates have the same P(D|M) there may still be some
epsilon (coming from P(M)) and one can pick up the next candidate for
deme optimization according to P(M).
Moshe, I've been thinking about calculating the "right" complexity
function (the number of bits to encode the model optimally), and it
seems to me there is an opportunity for some transfer learning. There
are several possibilities but one way could be to define the encoding
based on the frequency of successful programs (i.e. environments)
encountered so far (or probably sub-programs or frequency of symbols,
since one may not have enough programs) using Shanon's coding theory,
then one would use that coding to calculate the "size" of a program,
this size would be used both for selecting the next deme and, in case
the data are noisy, as additional feature of the of the fitness
function. I do mean across problems of course, I don't know how it
would help in practice, that obviously depends on the class of
problems and the information accumulated.
Nil
> Here the deme is describing a 20 dimensional space, which has 20 1-
> dimensional subspaces, 200 2-2 dimensional subspaces, etc.
>
> I would expect that plop should apply so many resources to the 1-
> dimensional subspaces, to the 2-dimensional subspaces, etc.
>
> Does PLOP do this?
>
> It does not seem to be biased enough to the formulas with low paramter
> dimensionality and seems to heavily overfit.
>
> Also to me it would make sense in each of the above subspaces to use
> something like levmar to find the parameter values that provide the
> best fit. Do you do something like this, or something else?
>
> I find it often hard to see exactly what methods you are using. Your
> thesis gives a good descriptive overview, but I don't see anyway a
> summary of what methods/formula you are using. The only way seems to
> be too slowly work through the code.
>
> Rene.
>
> --
> You received this message because you are subscribed to the Google Groups "plop-dev" group.
> To post to this group, send email to plop...@googlegroups.com.
> To unsubscribe from this group, send email to plop-dev+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/plop-dev?hl=en.
>
>
I believe you tried some transfer learning across populations using
univariate but it did not work too well (I don't have much intuition
why). Here obviously across many problems one would expect a very
light bias but it would hopefully provide a more accurate complexity
function. That is one would start with some theoretical approximation
(as you indicated me in a previous email) and improve it overtime.
Nil
> In particular I have been looking at various runs of:
>
> (defbenchmark easy-num
> :cost 500 :type '(func (num) num)
> :target (mapcar (lambda (x) (list (+ 0.1 (* 2 x)) x)) '(-1 .3 .
> 5)))
>
> I note that a canoncalized formula is first generated with 20 knobs,
> which is twice as many knobs as I would expect as the canoncalized
> formula seems to have 10 terms...
>
> Where do the 20 knobs come from? Can you explain that a little?
>
Sure. The canonical form, (p2sexpr (qcanonize %(lambda (x) 0))), is:
(LAMBDA (X)
(+ 0 (* 0 (EXP (+ 0))) (* 0 (LOG (+ 1)))
(* 0 (+ 0.0 (EXP (+ 0))) (+ 1.0 (LOG (+ 1))))))
which contains 10 numerical constants. So, 10 knobs there to adjust
their values. But, there are also 10 corresponding knobs to insert
subexpressions (* a x) or (+ a x) where a is a numerical constant; if
you look at knobs.lisp these are contin-inserter knobs. So for example
(lambda (x y) 0) where x and y are numerical variables has the same
canonical form inside the lambda-expression, but 30 knobs (10 for
constants, 10 for x, and 10 for y).
> Secondly, it seems to me that the complexity bias only occurs at deme
> level?
>
> Here the deme is describing a 20 dimensional space, which has 20 1-
> dimensional subspaces, 200 2-2 dimensional subspaces, etc.
>
> I would expect that plop should apply so many resources to the 1-
> dimensional subspaces, to the 2-dimensional subspaces, etc.
>
> Does PLOP do this?
>
Implicitly, yes.
> It does not seem to be biased enough to the formulas with low paramter
> dimensionality and seems to heavily overfit.
>
Probably. I've been working on some methods here (and so has Nil,
independently)...
> Also to me it would make sense in each of the above subspaces to use
> something like levmar to find the parameter values that provide the
> best fit. Do you do something like this, or something else?
>
I've done some work applying simple linear regression here, but
nothing fancy like levmar. The parameters are evolved via methods from
evolution strategies. It would be cool to try, though..
> I find it often hard to see exactly what methods you are using. Your
> thesis gives a good descriptive overview, but I don't see anyway a
> summary of what methods/formula you are using. The only way seems to
> be too slowly work through the code.
>
Sorry about this! As it says on the main Google Code page for the
project "(note that it will probably do rather badly at the moment as
the system is still very immature and some important subcomponents are
just stubs) ".
- Moshe
> Moshe, I've been thinking about calculating the "right" complexity
> function (the number of bits to encode the model optimally), and it
> seems to me there is an opportunity for some transfer learning. There
> are several possibilities but one way could be to define the encoding
> based on the frequency of successful programs (i.e. environments)
> encountered so far (or probably sub-programs or frequency of symbols,
> since one may not have enough programs) using Shanon's coding theory,
> then one would use that coding to calculate the "size" of a program,
> this size would be used both for selecting the next deme and, in case
> the data are noisy, as additional feature of the of the fitness
> function. I do mean across problems of course, I don't know how it
> would help in practice, that obviously depends on the class of
> problems and the information accumulated.
>
Interesting. One thing that I'd like to point out is that the "right"
complexity function for computing P(H|D) is not necessarily the right
one to use to guide search; it might be that the search algo is too
weak to escape the "bowl" of simple solutions that a strong complexity
penalty creates. This is definitely the case for GP, where
practitioners use multiobjective optimization for this reason, and
probably true to some extent for MOSES as well...
> I believe you tried some transfer learning across populations using
> univariate but it did not work too well (I don't have much intuition
> why). Here obviously across many problems one would expect a very
> light bias but it would hopefully provide a more accurate complexity
> function. That is one would start with some theoretical approximation
> (as you indicated me in a previous email) and improve it overtime.
>
IIRC it was while solving a single problem, across demes. The reason
that it didn't work well was that it led to premature convergence of
the new demes. Certainly this could be overcome, but I was in a hurry
at the time and haven't gotten back to it yet...
- Moshe