sliders

8 views

Skip to first unread message

Jonas

unread,

Dec 8, 2011, 9:29:44 PM12/8/11

to opennero

If I do the following:
- train a team with Approach Flag = 100
- save the team
- close opennero
- re-start opennero
- load the team

Now Approach Flag = 0 (the default). However, my team was trained
with Approach Flag = 100. Does this mean my team is now being re-
trained with Approach Flag = 0? Or, is the team retaining the value
of 100? If I set Approach Flag = 100, will the team get a combined
value of 200?

Igor Karpov

unread,

Dec 8, 2011, 9:43:47 PM12/8/11

to open...@googlegroups.com

The sliders are really just relative weights that determine how
"important" improvements in different areas are to the agent's overall
fitness.

The fitness is calculated as follows. We first find the average of
each raw score for all the agents in the population. Then we find the
standard deviation in each of these - i.e. how much variance there is
in all the different flag-seeking performances currently on the field.
Then, we find what is known as a Z-score - the number of standard
deviations above or below the population average for each agent in
each of the categories. So far we haven't used the slider values. What
this does is put all the differently scaled raw scores on "equal
footing" using statistics. So now, we use the user-selected slider
weigts. For each individual, we take the Z-scores and multiply them by
the weights (ranging between -1, or -100 to +1 or 100). If the slider
is 0, that Z-score doesn't count for fitness. If the slider is
something else, it contribues more or less, and positively or
negatively, to the overall fitness score. Now, you get the final
fitness.

So basically the gist of this is that the current slider settings are
the only thing that determines the relative worth of the performance
in all the different categories, relative to the population average.
The team's previous training doesn't change how it is scored now.
However, scoring it according to a different set of sliders doesn't
necessarily "kill" the previous behavior - if there is no direct
conflict, the networks will still "remember" how to do what you taught
them to do before. However if you start to train them with -100 for
the flag now, they will eventually evolve to do the opposite.

HTH,

--Igor.

Reply all

Reply to author

Forward

0 new messages