EvaluationInfo FitnessHistory training on sliding windows

Keith Nelson

unread,

Oct 3, 2012, 11:37:57 AM10/3/12

to shar...@googlegroups.com

Hello Colin,
My first attempt to train on a sliding window of evaluation data using the EvaluationInfo class with FitnessHistoryLength set to 10. I am getting some undesired behaviour: What appears to be happening is that each generation newly created offspring get awarded higher fitness "average" over one fitness value scored off the latest sliding window, than older genomes with evaluation counts larger than one scored over several sliding windows - so the older genomes get dropped quickly and never reach 10 evaluations.

I feel I am missing something obvious on the correct use of EvaluationInfo/FitnessHistoryLength . Any tips or suggestions much appreciated.

Keith.

Colin Green

unread,

Oct 4, 2012, 6:12:58 PM10/4/12

to shar...@googlegroups.com

Hi Keith,

On 3 October 2012 16:37, Keith Nelson <krne...@gmail.com> wrote:
> Hello Colin,
> My first attempt to train on a sliding window of evaluation data using
> the EvaluationInfo class with FitnessHistoryLength set to 10. I am getting
> some undesired behaviour: What appears to be happening is that each
> generation newly created offspring get awarded higher fitness "average" over
> one fitness value scored off the latest sliding window, than older genomes
> with evaluation counts larger than one scored over several sliding windows -
> so the older genomes get dropped quickly and never reach 10 evaluations.

I've had a look through the code and tried running the prey capture
domain with fitness length = 10. I didn't spot any problems, the only
thing I noticed was that genomes very seem not to live through very
many generations. I think this is because I bias selection towards
newer genomes (but I don't recall 100% if that is the case).

Also, the actually fitness buffer calculation of mean fitness over all
evaluations looks OK.

Could it be a side effect of your fitness scoring? e.g. if it is
giving different scores for a genome (non-deterministic scoring) then
you might get the occasional good score and lots of bad scores,
therefore older genomes will tend to have a mean that is fairly low,
and a few new genomes will be seen with a higher score.

I don't want to go into detail but I've been suspecting for a while
now that the standard/canonical evolutionary algorithm approach is far
to efficient at throwing away low scoring genomes. As an initial
suggestion and stopgap measure you could try setting the elitism and
selection proportions to 80%-90% (instead of the default 20%), and see
if that helps. Otherwise I probably need some more info to help you
out any further. Let me know how you get on.

Regards,

Colin.

Keith Nelson

unread,

Oct 5, 2012, 5:18:53 AM10/5/12

to shar...@googlegroups.com

Hi Colin,
Thank you for the suggestion much appreciated. Increasing the elitism did protect the older genomes and allow them to be averaged over more sliding windows than was otherwise occurring. This does appear to restrict exploration of solutions and adversely affect final results however, so I tried some other methods detailed below (with varying success).

> Could it be a side effect of your fitness scoring? e.g. if it is
> giving different scores for a genome (non-deterministic scoring) then
> you might get the occasional good score and lots of bad scores,
> therefore older genomes will tend to have a mean that is fairly low,
> and a few new genomes will be seen with a higher score.

I am training for a small number of generations on fixed window of data points, before sliding the window (leaving some percentage overlap of the windows). What I am observing is that after a few generations on the same data window the offspring all tend to be fitter than their ancestors at that particular window, and since they only have one fitness score in the history buffer, younger genomes almost always outscore older genomes that have survived one or more windows shifts and have to average their scores over them. It is correct that any particular window may provide good or bad scores overall, but even on a bad window of data the offspring can get their single data point fitness scores up above a genome that scored well one one good window, and just Ok on one bad window of data (for example). Wouldn't this situation be the norm for any system training over sliding windows of data - i.e. not a side effect of the scoring method?

I have tried the following two code changes:

1) Introduce a bias towards older genomes in the GenomeFitnessComparer:

public class GenomeFitnessComparer...
        public int Compare(TGenome x, TGenome y)
        {
            ...
            // Fitnesses are equal.
            // Secondary sort - highest number of evaluations first.
            if (x.EvaluationInfo.EvaluationCount > y.EvaluationInfo.EvaluationCount)
                return -1;
            if (x.EvaluationInfo.EvaluationCount < y.EvaluationInfo.EvaluationCount)
                return 1;
            // Evaluation counts are equal
            // Tertiary sort - youngest first. Younger genomes have a *higher* BirthGeneration.
           ...

This did not appear to change much however, perhaps is too subtle - fitness scores rarely being equal I suspect.

2) Modify the DoubleCircularBufferWithStats Mean property to use total buffer capacity instead of current the number of values in the buffer, idea being that the more evaluations > 0 in the buffer would outscore any single evaluation:

public class DoubleCircularBufferWithStats
        /// <summary>
        /// Gets the arithmetic mean of all values in the buffer.
        /// </summary>
        public double Mean
        {
            get
            {
                if(-1 == _headIdx) {
                    return 0.0;
                }
                return _total / _capacity; // Replaced Length with buffer _capacity count
            }
        }

The second modification of the DoubleCircularBufferWithStats Mean property does appear to make a significant difference allowing older genomes to survive and compete without elitism protection, but it feels like a cheap hack and I am still not sure of the side effects it may cause(?).

Next idea: To compete fairly genomes only compete with other genomes that have the same Evaluation count upto size of the history buffer (all genomes with evaluation counts >= history buffer size compete with each other) - but this would require quite a bit more changes to the code to try out.

Regards,
Keith

Reply all

Reply to author

Forward