Using genetic algorithm for Lc0

207 views
Skip to first unread message

Yasin ÖZDEMİR

unread,
Aug 15, 2022, 12:03:16 PM8/15/22
to LCZero
I want to create the weight file that lc0 uses using genetic algorithm, but I don't know how to do that. Can anyone help me?

brian.p.r...@gmail.com

unread,
Aug 17, 2022, 6:26:42 AM8/17/22
to LCZero
Genetic algorithm tuning would be far too slow for Lc0 type nets.
I have used genetic tuning for a few hundred parameter tunes with a little success, but it is random so it takes a long time.
Much better is "Texel" type tuning for a few hundred weights and was used very successfully in many engines back in the "hand crafted evaluation" days.

There are tens of millions of weights in Lc0 nets, so it would not be helpful.
The ML algorithms used for Lc0 nets home in on a good value (not necessarily the absolute best) very quickly relative to genetic.

DBg

unread,
Aug 21, 2022, 8:10:48 PM8/21/22
to LCZero
I would take a closer look at training process. during reinforcement learning.  AFAIu the "self-play"  batch process is actually a sequence of competition between self and mutant self, where mutant self has gone some selection somewhere else and is trying to recompete with old self.

There might be a way to say that RL is already doing this.   Parameter tuning is not the same as training from the environment.   I am not sure to know enough about global optimisation implementation technologies and their names.  But the backpropagation of errors is doing mutations on the the NN as feedback from the environment.

The RL may be picky in what undergoes selection and which environment (pairing?) is the selection feedback, not alllowing a wild populations of mutants to just select from....

it is making a narrow hopping trajectory perhaps on that wilder set of population of NN point of view.  I think we ought to look at what is already there, and measure stuff.  

DBg

unread,
Aug 21, 2022, 8:50:44 PM8/21/22
to LCZero
Erratum:  i may have collapsed or mixed up the step scales of training.  The weight update scale, and the self-play trajectory schedule. I now remember that at some point the mutated network stops playing with it previous frozen self... and gets to play again with itself, and one of the 2 is again allowed to mutate and get micro-selected/mutated  (weight update step scale).

Sorry for that. been a while...  also I have a backlog of understanding work to do on the policy story.  But whether the training trajectory is a collasped or not, this is still somethinig that can be view as an excerpt from a more complex parameter space exploration.

BTW: old world parameter exploration, all the "testing" in handcrafted things, training, testing in ML.  this is all about algorithm family spanned by parameters being viewed as variables take in a multidimensional space to explore.. The crucial difference is how formalized in the generalization aspect of the optmization. In supervised ML (which I pretty much understand well at mathematical level, or internally), the separation between training and testing is clear. But I just realized, that it is not as clear in RL.  And it might be tangled in the scheduling of self-play somehow.

I really think a fresh look at training trajectories (not the games, maybe the bundle of games as environment, but the parameter trajectories), might be a good thing to do. These are related sub-questions. if anybody would like to help me understand here or in some messaging. i learn a lot faster with discussions than with lectures with assumed audience comfort zones. I would appreciate.

Back to here: it is possible the op had a more "organic" question than past architecture parameter exploration, if LC0 were to explore sub-nets it could consider add that dimension of parameter explopration for the coarse connectivity.  e.g. if attention mechanism was not just a replacement but an "organ" network to be "docked" or interplay the typically less informed past architecture, connectivity at that level could be explored with population, and mutations are bigger scale than each sub-net environment feedback schedules are doing.

I am not updated yet on the attention new story of LC0. it might show in above.. but who is stingy about paying attention to hypothesis generation?  I have that backlog policy story before that one (even if possible to have them presented independently, it has been too long for me not understanding how policy actually works in past versions of LC0 play vs training).
Reply all
Reply to author
Forward
0 new messages