New Neural Network From Scratch

1,992 views
Skip to first unread message

Sumus Guyus

unread,
Jun 7, 2020, 1:36:15 PM6/7/20
to LCZero
Hello,

I wish to generate a completely new network. I went to https://github.com/LeelaChessZero/lczero-training but it requires data preparation. Is there a way to generate a new network without training data?

Brian Richardson

unread,
Jun 7, 2020, 3:05:29 PM6/7/20
to LCZero
You have to have training data.

pgn game files can be converted to training data (very exacting and somewhat complex), or you can train nets using the existing training data files from the Leela various runs, two of the more recent are here:


Message has been deleted

Sumus Guyus

unread,
Jun 11, 2020, 7:40:52 PM6/11/20
to LCZero
I think I understand now. The games are required to build the neural
network. The client then uses this to generate games. Do we then take
these new games and generate another neural network? Wouldn't this new
network be just as weak as the original network?

Dave Whipp

unread,
Jun 11, 2020, 7:48:44 PM6/11/20
to Sumus Guyus, LCZero
If game play was just a function of the net, then that would be true. But games are played with search, which acts as an amplifier of the current strength. No matter what the strength of the current net, if you use it as an eval function to look a few moves ahead, then the quality of play is better than the net by itself. So the RL process continually improves the net.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/f86f0ca3-1740-4412-bc93-c32015c86d3eo%40googlegroups.com.

Brian Richardson

unread,
Jun 11, 2020, 8:35:30 PM6/11/20
to LCZero
If you would like to try creating your own net from scratch, there is a relatively new collection of code (.bat, .py, and .exe files) at the link below. Start with loop.bat and follow the code.  Up to the parent directory will show you the programs directory (with a couple of .exe files that are needed) and other things.


I am currently training a 10x128 net with this running with a 2080ti GPU.
It is almost to step 10,000 after about 3 days and the net is around 2,900 Elo.
This was starting from a net that played random moves.

Before you begin, understand the difference between SL and RL (the above is for RL).  The main project does RL, and the code above does RL also (without the central server; everything runs on your local PC).  It uses lc0.exe (actually a slightly different version lc0-s.exe that involves more use of syzygy tablebases), but it does not run with the client.exe (which runs lc0.exe and sends the nets up to the central project server).  Lc0.exe is run in "selfplay" mode which plays against itself with a given net.  The output of these self-play games include all of the data needed for full net training.  SL is when you train from games already played, and most of the time those files (typically PGN format) do NOT have all of the data needed to produce stronger nets.  Without that "policy" data SL trained nets are about 200 Elo weaker.

My PC has played about 150,000 games in three days and trained many nets with each one stronger than the last.  The nets improve rapidly at first and then the improvements become smaller and smaller.  The main project has trained nets with hundreds of millions of games which have been contributed by the community to produce the most powerful nets.

My post earlier referred to these existing games which you could also use to train your own net.  It this case, the Leela games do include all of the policy information needed.  However, it will still be difficult to train a net much stronger than about 3,000 Elo.  The main project Leela is at about 3,600 I think.  Again, many millions more games are needed and many more nets which take longer and longer to improve.  Of course, one should not expect to fully duplicate the results of hundreds of contributor PCs playing and training for many months (the T60 Run has been going for almost a year).  Nonetheless, the principals are the same, and it does work, albeit on a smaller scale.

Sumus Guyus

unread,
Jun 11, 2020, 9:23:14 PM6/11/20
to LCZero
Thank you Dave and Brian for helping. I do not wish to compete against the community but rather just to understand the process.

Brian Richardson

unread,
Jun 11, 2020, 9:43:38 PM6/11/20
to LCZero
I am not competing either.  I do have some understanding of the process, and the "magic" is still amazing to me after having worked on a chess program for 40 years.

Brian Richardson

unread,
Jun 12, 2020, 6:43:34 AM6/12/20
to LCZero


On Thursday, June 11, 2020 at 8:35:30 PM UTC-4, Brian Richardson wrote:
...

It is almost to step 10,000 after about 3 days and the net is around 2,900 Elo.
...

After 10,000 steps with batch size of 1,024 sampled from 150,000 games, it is not at 2,900 Elo, more like 2,500.
The 2,900 was inflated self-play Elo, which is not the same as playing other opponents, but still with more time it would continue to slowly improve.


Niko Grupen

unread,
Feb 24, 2021, 6:03:59 PM2/24/21
to LCZero
Hi Brian,

I am curious if you are still working on your own custom training loop. I tried following the link you posted a while back (https://www.chadhosting.xyz/scripts/), but it led me to a dead link. I'm also interested in writing my own RL loop to do self-play training, so that I can examine how the features in different layers of the network evolve throughout training.

From what I have gathered so far, everything related to RL training with lc0 is boxed up in lc0.exe. I noticed that you referenced a custom version of it (lc0-s.exe). Does this mean that you implemented your own self-play RL loop? If so, would you be willing to share it?

I am ideally looking for a clean training loop like what python-chess offers -- see this link: https://python-chess.readthedocs.io/en/latest/engine.html

The closest thing I have found is a0lite -- link here: https://github.com/dkappe/a0lite/blob/master/engine.py -- but I am not sure if this will support loading in weights from the lc0 best networks page, or if training a network in this sort of loop would just take way too long to converge.

Any thoughts would be greatly appreciated. Thank you!

Niko

brian.p.r...@gmail.com

unread,
Feb 25, 2021, 10:01:26 AM2/25/21
to LCZero
The chadhosting scripts link is still live for me (just checked).
Far more than lc0.exe is needed for RL training.
Simply generating self-play games unfortunately was termed "training" early on in the project.
It is not actual net training, but just creating the chunk files.
Also needed is the lczero-training repo, which is where the nets are actually trained.
Then, matches are played to see if the new nets are any improvement.

Training is quite exacting.
I don't recall, but if you have not already done so, suggest simply getting familiar with SL training from about 100K games from the project self-play games first before trying the full RL cycle.

Niko Grupen

unread,
Mar 2, 2021, 3:33:46 PM3/2/21
to LCZero
Thank you for the insight, Brian! I can confirm that the chadhosting scripts are available on my end too.

I have been looking at SL training, which will be useful for pre-training a network, but I'll still eventually need an RL loop because my goal is to study how the learned features of the network change over time with more experience playing against different opponents. I'm also working on scenarios in which a high-elo network and a lower-elo network work together to play against an opponent at some intermediate level (either a third network or Stockfish w difficulty level set). Both require active learning, as opposed to static learning from a dataset.

I don't necessarily need the networks to be superhuman (> 3000 elo), but I need competitive human performance (~2000-2200 elo). I guess I am just wondering if I should:

1) Setup all of lc0 and lczero-training and modify as needed to run these experiments
2) Write my own RL loop using python-chess or faster equivalent, wrapper to transform board positions to be compatible with pretrained lc0 networks at certain elos, etc.
3) Look for an existing lighter-weight alternative with pretrained networks (or compatible with lc0 networks).

brian.p.r...@gmail.com

unread,
Mar 2, 2021, 6:59:22 PM3/2/21
to LCZero

I have been looking at SL training, which will be useful for pre-training a network, but I'll still eventually need an RL loop because my goal is to study how the learned features of the network change over time with more experience playing against different opponents.


If you mean training nets from games grouped into tiers of similar strength, this has been done and the results are inconclusive.
I tried training nets using groups of games in tiers starting at about 1,600 Elo, then in groups of about 2,000, and so on IIRC (I lost a lot of my old notes I'm afraid).
That was during my early days of net training, so my methodology was likely flawed.

Others have trained nets targeting human-level strength tiers with considerable success.
Suggest asking on the Leela Chess Zero Discord server.
Strong human play (up to about 2,900 Elo) can be done with SL with as few as 100,000 games, and some have said playing styles are evident.

If you mean actually looking into the net to see things/features change, that is not possible I think with Leela-sized nets as they are too large.
A small Leela 10b net has about 3.5M parameters, 20b is 25M, and a large 30b net is 80+M, I think (could be off somewhat).
The NNUE nets StockFish uses now are vastly smaller and there are human-readable visualization tools to see changes within those nets.

Of course, there are no internal net "features" as there are like the input features, for example "in check", attack maps and such.
Leela uses no features other than the individual piece type bitmaps and position status (castling, en passant, etc).
The nets are just weights with no discernible groups AFAIK, but maybe you can break some new ground here.

There are some lighter weight implementations such as:
https://github.com/dkappe/a0lite

For training, there is an out of date PyTorch version here
https://github.com/Cyanogenoid/lczero-training

python-chess will be far too slow to generate self-play games for RL I think unless you can spin up many CPUs on a cloud



Dariouch Babaï

unread,
Mar 2, 2021, 11:53:58 PM3/2/21
to brian.p.r...@gmail.com, LCZero
Analyzing Representations inside Convolutional Neural Networks
https://api.semanticscholar.org/CorpusID:229363763
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/HnZ004HssWY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/dac7d773-79a7-4dd6-bdd4-f15e138669b7n%40googlegroups.com.

Dariouch Babaï

unread,
Mar 3, 2021, 12:06:47 AM3/3/21
to brian.p.r...@gmail.com, LCZero
Just don't look at the multidimensional representations, you mght get your eyes out of their sockets, instead look at the embedding euclidian space, and the metric in "latent" space (don't like that term, but maia seems to like it, I prefer the space embeded in the stem or tower below the heads, that set of repeated layer motifs, transforming the "basic input vector" already hard to look at yes...  but in the article above... I think vision is used as proof of concept to do the analysis without looking at the non-2D space, there 2D for establish correspondance of groupings with what we would see anyway. the idea being to use the method of groupings even when not 2D, as the source of useful grouping candidate....

hope this helps, and i suport this op efforts and those that help in this thread. been waiting a long time for such initiative (me too lazy or able to go from math. to code, from concept to implementations, but i can recognize good stuff).


Le 02/03/2021 à 18:59, brian.p.r...@gmail.com a écrit :

jmba...@gmail.com

unread,
Mar 3, 2021, 4:34:38 PM3/3/21
to LCZero
Good evening, how do you activate your .bat files, I downloaded them, but then I'm lost, thank you if you have some time, Jean-Marc

arturo....@gmail.com

unread,
Mar 4, 2021, 7:36:17 AM3/4/21
to LCZero
Thanks for you help, Brian.
I am trying to reproduce the whole pipeline you gave and I successfully ran the 'generate.bat' command, yielding a bunch of training data. I understand that these training data are games, but they are not in pgn format. Is there any way that I could look at these games? I mean, is there any utility to convert them to a human-readable format?
Thanks again for your help and patience.

brian.p.r...@gmail.com

unread,
Mar 4, 2021, 8:36:46 AM3/4/21
to LCZero
Suggest spending a couple of minutes searching about .bat files.
If not knowing about .bat files is the starting point it will be a very long way to reach the goal of net training.

brian.p.r...@gmail.com

unread,
Mar 4, 2021, 8:52:31 AM3/4/21
to LCZero
I have never looked at going from chunk files to PGN format games.
Self-play games, especially the early ones, are just awful and nearly random chess.

There were one or two tools to convert chunk files to human readable format.
However, the chunk file format has been constantly changing and there are now six versions, so I suspect the tools are quite out of date.
Here is one utility that displays the chunk bitmaps from FENs to give you some idea of what things look like
https://github.com/dkappe/badgyal/blob/master/badgyal/board2planes.py

You could read the code in the client to see what happens to the games in PGN format.
The main project saves them, but I don't recall if that is something that the client does or lc0.

Dariouch Babaï

unread,
Mar 4, 2021, 5:57:49 PM3/4/21
to LCZero
Me again.  I think those games, the self games are important to look at, but not individually useful.  What i suggested earlier about using clustering (using some subset of the tower layers or just the last ones prior to heads) for populations of games, transformed that way, is what is interesting to look at: the evolution of those clusters from batch to batch.  get some sense of the diameter of the whole sets of games.  Sure look at a few.  i would like too. There may alreay be such individual PGN.

Sad to hear that nobody has seen the use of those self-play games toward understanding why we still can't use supervised learning on human database (even the best ratings or curated ones), but a self-play database from uniform random initial conditions can do it. Is RL a magic leap from SL? Or is there something to be learned about the game population set structure as "viewed" (more pratically: mapped, and measured) through the network layers of convolution motifs?   I sure think that somebody ought to try.  Has anyone yet?

I remember that there were scripts to go both ways, right.  Do you kow how often the fomart is going to change? 
and why has the need for transcoding disappeared.

Could some source code, give the needed information to adapt those scripts? better yet, documentation of format updated and sufficient for adapting the scripts?
If source code only, would it be dispersed on more than one file.  I would guess that storage size, might have been some factor in the binary formats used?
Thanks. I am asking those questions, for myself, but also for the op.

To the op: if at some point your are tempted to give up for some reason as this obstacle right now, let me know.  I might muster the energy to looks at the scripts and new formats.
If you don't find resources, i could retrace what i search last year before giving up myself. I did not dare to do what your are doing. or do not have the skills or energy to finish.

I will read this thread more carefully though to understand what kind of hardware and time is needed for such self-play databases creation.  Maybe i misunderstood.
Also, if you are hesitating about the statistical work layerwise, as suggested above and before. let me know. That was my objective when trying to get a sense of where the generated data had been kept. I was also interested in doing the same work in relation to the evolution of the various versions of lc0, or those the various sub-optimal hyperparameter tests within any non-optimal engine in history, to find signal traces of that performance change, and corresponding paramteres and NN designs, in the non-parametric statistics of the type above.

Sorry if what i am talking about seems irrelevant.  I don't think so. And whenver you can produce databases of such games.  I would be happy to explain and even do some complementary work.  I did not get if you wanted to look at individual games only, by self-play loop i assumed you were ready to work with databases, and at least reproduce some of the training.  I wonder how those batch file, can replace the big infrastructure of the distributed setup.  Is that a question of patience? sorry quite an ignoramus here about such scales of computing.

If anything.  Keep looking at the random ones.  I think they explain why legal chess is a better universe for posing the chess engine problem than assuming  perfect chess types of assumptions.  If perfect chess or the best games from self-play could generate same amount of self-games total. why not use those to train new NN using supervised learning on those best games.  Same labels used, just no trajectory in 2D rating plane.  Why does the initial training need that random play?  and perfect games of the same amount can't give the same performance.  I think understanding that, can give structural criteria for human databases sampling toward using them in a well-posed problem of SL with appropriate database structure, this is not only a question of size of the database, or quality of transcriptions from tournaments, or if it is full of bad games.   I think i already made similar point. hope fully i make sense. good luck.
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/HnZ004HssWY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

brian.p.r...@gmail.com

unread,
Mar 4, 2021, 9:07:26 PM3/4/21
to LCZero
On Thursday, March 4, 2021 at 5:57:49 PM UTC-5 DBg wrote:
Sad to hear that nobody has seen the use of those self-play games toward understanding why we still can't use supervised learning on human database (even the best ratings or curated ones), but a self-play database from uniform random initial conditions can do it. Is RL a magic leap from SL? Or is there something to be learned about the game population set structure as "viewed" (more pratically: mapped, and measured) through the network layers of convolution motifs?   I sure think that somebody ought to try.  Has anyone yet?

I remember that there were scripts to go both ways, right.  Do you kow how often the fomart is going to change? 

Could some source code, give the needed information to adapt those scripts? better yet, documentation of format updated and sufficient for adapting the scripts?
If source code only, would it be dispersed on more than one file.  I would guess that storage size, might have been some factor in the binary formats used?
Thanks. I am asking those questions, for myself, but also for the op.

I will read this thread more carefully though to understand what kind of hardware and time is needed for such self-play databases creation.


Nearly ALL of the data:
https://storage.lczero.org/files/
note the PGN files to not match one to one with the chunk files
Nearly ALL of the source code"
https://github.com/LeelaChessZero
missing the trainingdata-tool (search Github for various versions).

SL is just "fine" for PGN collections of human games and training nets, but only up to about 3,000 Elo.
The limited strength of the nets is due to having policy information for only the one actual move made.
With the self-play games, all of the legal moves have a policy probability.
This is worth about 200 Elo alone.
With additional code/tools people have augmented the PGN games with multi-PV searches to add policy info for some additional 6-10 moves.
This helps, but is still not as good as the self-play games.

If you want to know more about the project, joining the Discord Leela Chess Zero server is key.
pretty much the same as the GitHub wiki for Lc0

Hardware needed for self-play and RL:  Colab Pro works fine for that, but not for the very strongest nets that use hundreds of millions of games and thousands of nets.


Tony Mars Rover

unread,
Mar 4, 2021, 9:25:16 PM3/4/21
to LCZero, brian.p.r...@gmail.com
Sorry, this is not Brian's e-mail! This email belong to Tony Mars!

Regards,

Tony

Regards,


Tony


Sent from my Fire Tablet (Fire Tablets = the highest quality for the lowest price)



From: lcz...@googlegroups.com <lcz...@googlegroups.com> on behalf of brian.p.r...@gmail.com <brian.p.r...@gmail.com>
Sent: Thursday, March 4, 2021 6:07:26 PM
To: LCZero <lcz...@googlegroups.com>
Subject: Re: New Neural Network From Scratch
 
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/21a4a91f-89a0-4153-961b-a0720259000fn%40googlegroups.com.

Dariouch Babaï

unread,
Mar 5, 2021, 3:46:22 AM3/5/21
to LCZero
Well, things can happen in a year.  Thanks for the dense summary.  It contains lots of news for me....

https://storage.lczero.org/files/

Glad to hear about the training data being available, in correspondence with the nets architecture that did produce them (right?).  I could look at those filenames and there would be some key somewhere, for me to find the neural nets that were used to produce them. 

https://lczero.org/dev/wiki/project-history/

I have not been following the evolution of the documentation of the wiki for a while.  Glad to hear about a history section. I will have a look.thanks.

The PGN would be mostly for curiosity I would not need the bijection with the chunk files for all of them. Although i would be curious about what kind of mismatch, systemic? loss in storage, or partial corerage from change of format). The pgn would be for "manual inspection" if i can find some subset with correspondence to chunks, and to NN then i would be kind of encouraged to wake up again.  Once has long as there are transitive ways to go from one place to another (establish relation here, enabling the same kind of relation there) I could do something.  So my most probably understand of not being bijective, would be there are chunks without PGN, but all PGN have their chunks....


SL is just "fine" for PGN collections of human games and training nets, but only up to about 3,000 Elo.


However, I need some help to fully digest what this paragraph implies.  So where can i find documentation about the "fine" SL with PGN of human games. The training setup (i.e. targets, database sampling "mesh", size, source. loss function given the targets.  and the performance results. Is there some dynamic range in the available data left, or are only the best performers kept on records?  I will think about this paragraph, and come back with what i don't understand after some times.

I have been not watching the lc0 project since before last summer: I used to look at the wiki, the training flavored off shoots on githhub, and the experiments, Discord only for DM, the chats there are just chat, unless things have calmed down, but Discord was meant for massively multiplayer paces of life, and quick exchanges.

I think I took my leave soon after hybrid started emerging. So , I missed the SL thing completely. very curious about it. Maia, did not seem to be oriented toward better or best chess approximation, and I consider that they did very good work and show the quality of the lichess database, in terms of statistical "depth".  I did not see much display of how variable certain distributions may have been, but the few main moment (average) type of graphs i have looked at so far, showed promise, for future careful sampling and sub-sampling to accommodate many training  questions. possibly even best chess.

Are you saying that SL trained big networks are competing as engines?  Human games, not distillation of networks, or other "oracle" supervision?  Where did the data come from.
to give the 3000 ELO performance... and very curious about the sampling criteria.  This is news. until now human database and neural network could not do the job.  What changed?


The limited strength of the nets is due to having policy information for only the one actual move made.

3000 ELO really.   from human games?  Yes i understand, the evaluation function has errors compared to the case of big enough neural net, good enough global optimization, and good enough legal space coverage so that the true outcome probabilities can be obtained from a sufficiently refined mesh, to obtain 100% performance on test sets (I would understand that better than 3000 ELO.  but maybe some comparison with RL.  same size of net

human games labels being the games outcome? right.  Some other engine acting as oracle on the human positions? if the answers are already readable somewhere not embedded in source code exclusively, and not upon me asking the same questions to another person, please let me know and i can go read.
If i need to interview someone on Discord, I would like to know who, because going fishing there and rambling again. will crash the chat servers.. or I will drift like a bottle in the sea, I can foresee that with angst.


With the self-play games, all of the legal moves have a policy probability.
This is worth about 200 Elo alone.
With additional code/tools people have augmented the PGN games with multi-PV searches to add policy info for some additional 6-10 moves.

My other difficulty is about policy.  I have this understanding that there exist a true mathematical evaluation function that could be approximated by a sufficiently big and expressing neural network,  that would not require any policy or just the trivial one. Corollary to that understanding is that the current evaluation NN in the RL, still requires policy to weed out the approximation error left because we either have not a complex enough NN, or the training method is unable to find the nook and crannies in its parameters that would generate sufficiently complex functions to approximate the true math function.  So policy to weed out the error from the position evaluation.  I then understand that somehow that the probability over some tree search are being multiplied as each new node evaluated reduces the error in the outcome probability estimation of the conjunction of events that the game using the path being considered (from root position down or up or to the right... to some "leaf" node) (there is some fog left explained below, i am working on my general tree search understanding in SF classical too, not very natural for me, all that recursion).  But assuming some uniform error in the evaluation probability function over positions space (legal), consider more node alllows to consider enough evaluated node so that if we have converged to a good enough approximation, the better path the total probability estimat error will keep going down. This is how, not precisely yet, i understand that some confidence interval given, there will exist a deep enough path that is both the most advantageous, and sufficiently small combined error withing the CI, to justify stopping the search... is this a correct "averaging" view of your understanding. or anybody that understand what i am struggling with here, can tell?

I still have trouble understanding past documentation and possibly even current one, at some key places.  some of it is the kind of math, my training from applied math, using dynamical systems, and functional analysis, physics, and all sort on non-finite mathematics did not build a stong intuition about discrete structures such as tree, and the various traversals that can happen on them.  But i am quite familiar with the functional analysis part of the theory behind lc0, i mean optimization and probability theory, and SL with neural nets. RL has been the tough part for me.  I thought i understood the self-play training part, but it all the policy related document, or anything about the probability sampling on  node trees, does not have clear separation between self-play, various ways of training, from those games other neural networks, and the policy behavior during play. It might be me, not readig behind the lines correctly, or it might be that sometimes when we are in deep in some project, it is hard to verbalize everything that has become second nature. so these various context differences might be assumed and not spelled out. Do i make sense.  am I wrong? should i go read some particular updates since last year.  Also, is the difference between MCtree search and PUCT crucial, or my basic understanding of probability sampling methods like MC are also in PUCT, with some speed optimizations. Equivalent question maybe. Are the GO diagrams still referred to in the lc0 wiki still valid  to explain the probabilistic tree search there and in chess?

Thanks for Collab pro reminder, i did not know if that was still a path for one person.  I would not need the very best nets at first.  more important for me is a range of sizes of nets, and a range of performance, the more spread the more information i expect to be able to detect, using continuity assumptions for integrated quantities that most statistics end up doing one way or another, across that dynamic range. (does not mean linear though).  Also, while this thread was more about creating their own, which is still interesting for direct access to the leanring trajectories (or will you also greet me with more good news, that there is also learning trajectory data available for understanding the optimisation process in paralellel with trainning games being generated and RL batched.

I am sorry to dump all of that on you. Please help me find to right places to solve my puzzle... if you see some possibilities.  And thanks to the op. Possible you have some insight into my questions. and I think that the work to figure out what happend from the current document or source code history, and quests on Discord to piece up disparate information, might be requiring the same energyh that your are putting here in creating things from scratch.  At least, i can tell you. keep a good file nomenclature and don't throw aways data, let the tournaments play the gladiator game.  keep the bad data too, because progress can be made from seeing what does what, from under the hood.

Thanks for humoring me.  I might go back to my hibernation now... ;)
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/HnZ004HssWY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

brian.p.r...@gmail.com

unread,
Mar 5, 2021, 8:18:21 AM3/5/21
to LCZero
As I mentioned, no there is no mapping from PGN games to the corresponding chunk file for that game for the main project games.
Using date/times one can get close, which enables selecting games say near the start, middle or end of the runs.

SL nets around 3,000 Elo have been trained by me and several others for a couple of years.
Policy is very important, along with several other major factors.
More samples (positions, hence games).
More training steps.
Good learning rate schedule.
Good net size.
Appropriate training hyper-parameters
My best "small" net (10x128) is about 3,500 Elo using the self-play games (ie with full policy).
The very best nets are currently about 3,700 Elo and much larger (30x384).
IIRC net speed slows down with the square of the filters, so self-play, training, and match testing all take tremendously longer for the larger nets.

Broader picture (outdated, but still a very nice presentation)

The training process is very exacting with many steps.
For example
RL requires vastly more resources than SL.
Testing the trained nets is also fairly exacting.

DBg

unread,
Mar 9, 2021, 11:10:44 PM3/9/21
to LCZero
Thank you very much for this global factual overview.  I understand the importance of policy, I think at some mathematical level, as required as long as the position evaluation NN is having some error to be cancelled out, which makes it a requirement for any self-play training. 

If the position evaluation NN was expressive enough to sufficiently approximate the true evaluation function (or represent the true outcome probabilities under best chess from given position, which one could posit to exist, for example if the entire legal position space were known and any path part of some playing pair of move quality (e.g. perfect chess, but could also be other pairs, and their outcome distribution).  But the fact that policy during play, and its relation to something of the self-play database, that is not represented yet in the evaluation NN, is needed toward engine competitio measures of performance, might just indicate that work may have to be done still there.  I know this is not necessarily feasible. And is possibly a quirky point of view on how lc0 may have been performing all along.  So I would think both the policy improvement angle or the NN architecture/training set-up, might be plausible avenue for improvmenet.  I still need to understand what is captured by the policy NN about the self-play games, and wether that is about all batches of self-play, from uniform policy to "best chess" last batch considered.  I am working on that,,  might take some time.

In the mean time I thought of things I should tell the op, why the iron is still hot in the development.  Please do not hard-wire the intitial standard position into your neural net executable. make it as modular as RL versus SL, and other currently desgined as modular features of lc0. As for example the loss functoin being delegated to some configuration file allowing for more experiements to be made, without overhaul of the source code.  

Why this advice, because I have had some experiments, I wanted to do, as a non dev, but interested "scientist" understanding machine learning concept and other related mathetical notions.  However, upon some discussions, it was made clear to me that in order to use Endgames (eg. TB classes) position, as root for self-play training experiments, both to reduce hardware requirements, and to have a common laboratory to test all sorts of ideas on common preformance refenrential system.

In TB land, one has the TB table as solved (from any position, not only those that best chess would explore from non-TB roots upstream games, but positions that could arise from any level of play "before" falling into TB classes.  This allows for all the questions and hypotheses to have a toy laboratory less costly, but also allowing various performance measures to be defined and applied under many type of chess play, from all walks of chess life, even engines of different mathematical frameworks like SF could be part of the experiements.  A common non-tournament based experiemental set-up.  with mathematically precise parameters, and infinite database sampling control (well we can sample the whole set of positios in the limit, to measure statements i may have hinted above).

But, I was made to understand, that I would have to go trhough many lines of code to reproduce the whole lc0 training process called self-play, with the only change in condition being the Initial position, where the zero-knowledge prior of uniform move probability given any position would be applied as per self-play training method.  

Also, even without the above. It might be instructive to test all the parts of your developement with that reduced complexity set-up.  It would keep implementing all other determinants of the game of chess, only without having to wait for the scale of harware constraint to spit out resutls, and allow you some insight.  As I assume, your intent in going from scratch is to learn from there.  Also, human chess, often is helped from the clearer chess mechanics that fewer men on board allow.  making each move potentially having larger delta changes in outcome odds (not all moves, but the best one are likely to be, compare to the more subtle opening root candidates deltas).

Other bonus: some relationship, between precise mathetical measures of performance in TB land, and engine pair ELO types of performance measure could be established.

Has any of the above, been approached already, and if rejected where could I read about it, and get convinced perhaps of where this is a dead-end proposition.

I think, anyway, that the hybrid approaches, need some characterizing tools from the clearer frameworks that SF11, and lc0, provide.  Which means both those engine mathematical foundations need to be considered, and perhaps made to become comparable... again TB land......

DBg

unread,
Mar 9, 2021, 11:14:29 PM3/9/21
to LCZero
is there no way to edit, many statements  I made need closing... too many comas, or parentheses in train of thought expressed... if need for that let me know please.

Dariouch Babaï

unread,
Mar 10, 2021, 12:08:55 AM3/10/21
to LCZero
Fixing

Le 09/03/2021 à 23:10, DBg a écrit :
Why this advice, because I have had some experiments, I wanted to do, as a non dev, but interested "scientist" understanding machine learning concept and other related mathetical notions.  However, upon some discussions, it was made clear to me that in order to use Endgames (eg. TB classes) position, as root for self-play training experiments*,

I would have had to review some 1000 lines of codes or changes such.  1000 is not the point. the point is only understanding what the math, and what the training is and those, is not enough to be able to experiement with the code via scripting.


* in order to do 2 things:

Dariouch Babaï

unread,
Mar 10, 2021, 12:12:42 AM3/10/21
to lcz...@googlegroups.com
I meant ELO ratings measure stemming from many games with other rated engines. (pairing based measure).


Le 09/03/2021 à 23:10, DBg a écrit :

brian.p.r...@gmail.com

unread,
Mar 10, 2021, 6:59:39 AM3/10/21
to LCZero
Not sure if this is relevant, but I have trained simple endgame positions to see how well smaller nets do with them.
KRvK and KQvK were ok, but no net could properly do KBNvK, at least given my patience to bother with it as having a net learn it is not needed if using tablebases anyway.

Separately and curiously, nets trained with far less than 32 pieces (16 or 18 I forget exactly how many) somehow are able to play pretty decent chess from the opening position anyway, despite never having seen any input samples with all 32 men.
More ML magic.

Dariouch Babaï

unread,
Mar 10, 2021, 10:34:52 AM3/10/21
to lcz...@googlegroups.com
You do have my attention. Also, glad to hear about experiments with smaller nets "to see how well" they do on known combinatorial complexity.

Your finding one boundary there, is the beggining of insight.   "not needed if using TB anway", is what i think may have been a distraction for getting insight about how hyperparameter interact (i include architecture, and initial position, maybe..., well just now i am thinking of including it).  Patience, and tournament primary objective are tangled.

I have been impatient myself, with how the focus of lc0 has been on engine tournament, more than looking back at their own data.  But hey, maybe things have stabilized by now and a new pace of progress, with more internal measures in the "pipeline" can be tried.


Separately and curiously, nets trained with far less than 32 pieces (16 or 18 I forget exactly how many) somehow are able to play pretty decent chess from the opening position anyway, despite never having seen any input samples with all 32 men.
More ML magic.
ML magic?????  How about chess being locally deterministic, and statistics being able to figure out the statistical order emerging from developping that? And since there is no measurement error, and that we know the local deternminism mechanics (chess men mobile dimensions, and stopping rules, etc...) is of finite dimensions, one may not need to explore all the full game tree to get some basis of functions that can contain the true evaluation function.  However how not full the sampling, zat is the question. right? Should we stick to an opening repertoire and sample it to death?

Like already intuited or known or experienced for human chess pedagogy,  endgame practice already contains the building blocks of that mechaniscs, and what is learned there can be generalized (patterns there can be viewed and superposed over different contexts, and learned behavior applied there, because of the self-consistency of chess, and its finite topological complexity (TBD), can be applied here. I am not sure to have the appropriate level or words to describe exactly what is half-intuition and half-mathematical support*. 

I would assume that the reduced men full game experiments did include all the classes, if not all men).  This is the spirit of reducing combinatorial complexity but keeping the essential complexity of the mechanics at play. With at least one men of each class, there would be enough position to cover the contexts that already contain patterns that can be transitively (?) applied. in the similar fashion that engame men mechanics can be learned and applied in other contexts with more mobile units (chess men).

* there is some dynamical system thinking in my understanding, how a local generator of new states as time steps forward, can be determistic (in dynamical system meaning, physics), and yet when "recursively" applied that generator yields exponentially complex behavior looking random at some time scale, but when looking at a bundles of the trajectories from same initial conditions, a statistical order emerges.  Sorry for the digression, it may appear out of place.  As chess states may not share a lot with that, in the original basic input vector (or FEN or PGN) encodings.  The embedding space(s) inside the neural nets, already share more with it though.

The measurment lack of error, and the statistical order thing, need a bit of work to explain how sub-sampling the whole game tree can make the statistical order equivalent to the logical exploration of the tree completely.   I think this is about the topological complexity of chess.  The lower dimensional manifold contained within, and that a sufficiently complex NN can embed properly have having exploded the input verctor into enough of a "high level" mapped representation.  That this manifold dimensions may not have to be in the same "orbit" as the full combinatorial complexity.  And resnet motif layer repeat number, may be a good candidate for such "topological" complexity.

I hope i did not abuse too much terminology. or that i have not been rambling in vain.

So, pointers.  can you share the data. the branch code?  the executables? what kind of set-up would I need  to reproduce some of those experiments.
How much work is it to use arbitrary positions as self-training roots.  I am pleased, that such rational experiments have been tried. 
And I would like to kow more, obviously. 
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/HnZ004HssWY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

Dariouch Babaï

unread,
Mar 10, 2021, 12:27:11 PM3/10/21
to lcz...@googlegroups.com
been rethinking a bit.  about root position as hyperparameter.

If I understood correctly the work by oscardsmith using SL on TB as supervising database, the performance did not have to be measured through competition, and could be estimated by some robust cross-validation, have the universe as a possible sample is great that way, no worry about whether the sample is representative of the "law"of the input-ouput expected to apply to that universe.  I am not being grandiloquent here, this the most concise way, allowing me to not go ramble (alteady doing, am i?), in order to explain the relationships between "univers", "sample" (yes quotes everwhere, i am prudent), and "law" of input-output splicing that universe.  In ML, that law is called probability something (law, density, distribution, why not just function? or just probability measure, or measure).

Back to main point:  do i remember well, and the sampling was all over the position space?  no game sequence implied, just the TB target information associated to each position in the class of positions (the material class, all possible positions with same material content, legal position that is).  Self-play, is based on games, and the connector between consecutive positions called policy, learning its best preference function of move given any position (or helping the evaluation learn though its policy optimisation consquences given the nodes being searched, and the terminal ones already found in other games during the batch, asking for corrections here).

So, if my attempt to use such work as a comparison basis for all kinds of chess (restricted to this mathematical surrounded class of games, i.e. all the games from all the positions as root), is to make full informatoin extraction, we would need to find a way to find comparables.  There could be 2 ways:  the easiest, find out about SL ability to generalise from sub-sets of paths with TB  class training only on one position, but including sub-sampling of any legally compatible positions left i that class.  perhaps having to generate legal games? to generate positions sets?
The other think of ways to merge different networks having learned on different root position through that position RL training.  could this be iterative? how to RL train on top of RL training?

I don't have all the answers, and before going in orbit too far, having some min size classes to try any of such ideas, where the challenge of design is basically the same, but not the time to result and hypothesis testing, down to some level of interactive thinking.  being able to quickly course correct errors of conceptions (i do a lot of those, I also call it imagination, to not feel inadequate, and because it did help me in the past, while common sense seemed to suggest waste of time).

that's it. the conceptual hurdles, but that is fun for me...



Le 10/03/2021 à 06:59, brian.p.r...@gmail.com a écrit :
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/HnZ004HssWY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

Dariouch Babaï

unread,
Mar 10, 2021, 9:00:40 PM3/10/21
to lcz...@googlegroups.com
been rereading.  sorry for my english. it went down the drain, or it was too early in my day.  i was sure to have put all the words.  If need rephrasing let me know. anyone....
 last bit about 2 ways:
1) SL on TB restricted so positions part of self-play coverage  (or consider legal paths in the TB classes of positions). instead of current sampling across all positions (using balancing criteria on target space, and other sound database sampling meant to represent the "universe" expected as test after training.

2) Find a way to learn from many games AND many roots for those games.  could be meta-learning.  could be iterative.
The problem of initial position combined with initial policy based on uniform eval prior, kind of seem to make difficult using a previous RL batch net, as the initial weights for another RL sequence of batches on the new games from new positions as root.  Could there be some meta learning, about how to combine sub-problems.. I guess this may have been tried too for other decomposition perhaps.   I wonder about how to chose the best set of initial positions, to end up covering the same support as that found to be enough for 100% performance on target space. 

This could even have human consequences, on how to best experience chess positions to get a global overview as fast as possible.  not to assume that the many games played in one tiny corner of the space, is meant to be generalized far from it.... But i might be alone, in thinking one can learn about teaching humans, from teaching a neural network.

Any experiments made about combining NN from different training sets? (maybe some augmented system from smaller nets corresponding to sub-problems, and then a sufficiently big net to be trained by some baysian combination (or another model of combination that gives best results using the direct TB performance measure already used).  I think this has already been done training smaller nets from bigger ones, distillation, i think it is called.  But the size difference is not a requirment, it was because of practical hardware constraint objectives, right.

I stop before making even more unreadable sentences.  my thoughts were readable as I was writing them.. .don't know what happened...  
Thanks for considering my questions and ideas, and for potential critical feedback, whenever.  and i would be curious to hear about other experiments of the same nature. 

brian.p.r...@gmail.com

unread,
Mar 11, 2021, 6:57:26 AM3/11/21
to LCZero
ML "magic" was meant to be in jest (somewhat).
The ability of nets to learn to play strong chess is astounding and does indeed seem superficially like magic relative to hand-crafted traditional engines.

Frankly, I think I understand only about 10% of your posts.
There is a lot of hand waving about what should be, but there is relatively little that is empirically known about training Leela chess nets that has been verified with actual match game testing results.  While some things are generally always better (more input sample positions from more games, more training steps, larger networks), this quickly results in excessive time required for data preparation, training, and testing for a single person trying things.  So, several individuals in addition to the main community project effort continue to try things and experiment, not necessarily in a methodical way.

Ensemble nets have been tried (search for the "steins")

Here is some albeit high level info on endgame nets

Teaching nets is nothing like teaching people, although nets can be used to help people learn, of course.
Tiered training has been tried several times.
A documented example:

The main Leela Chess sites from which to start:
https://lczero.org/  (most of the information)

There is extensive information about the project at these sites (all outdated, some redundant)
http://lczero.org/dev/  (this site is largely a copy of the above, other than the blog)

How much work is it?
A LOT, and a fairly high-end PC is needed.
Training Leela nets for chess requires many steps with exacting attention to details.
This makes it difficult in terms of practicality, even if not theoretically.
Moreover, the project continues to rapidly change, sometimes at the expense of backwards compatibility.
Here is an outline of many of the changes:
And, it is not up to date, with numerous T7x series "runs" which will culminate in a new "main" run likely T80 in a month or so.
The Leela Chess Zero Discord server is very active.  It is the place for the most current information, and people often provide assistance in the #help channel.

Here is one guide about the process (outdated, like all things related to Leela Chess).
Note: the sample yaml file link in the guide is intended to provide information and will not work without extensive editing.
Instead, suggest using a different simpler yaml file

Another option is replicating the results of the "standard dataset" per

I realize that this is a "kitchen sink" post, but perhaps it will enable you to pursue some of your ideas further.







DBg

unread,
Mar 11, 2021, 12:19:18 PM3/11/21
to LCZero
Hi, I am sorry to have kind of hijacked this thread (both op and arturo), i hope the replies by Brian were helpful though. 
I would like, though, if you could could report some of your progress here. I am giving another consideration to whether scaling down this self-play training pipeline to 3-men TB type of initial chess position, is feasible without overhaul. 

In order to save time and hardware constraint, and to reduce intrinsic complexity of the problem, try to focus on core problems that can be isolated by looking at TB land.  (a bit like Endgame training for human does, actually.... ).  Please let me know your evolution and whether changing the initial position could be in your respective plans.  The other option which i gave up on a while ago, did seem to appear more feasible by Brian's post about reduced material training experiments, that got me all hopeful, but it may still be unreachable for me.  I don't know if I should keep using this thread for that.  Both, starting from scratch seems like the good thread to mention that possibility of using toy problems as proof on concepts, if that kind of process is not fast enough i understand.
But, I thought that the idea here, was either to learn or to go back to basics.  thanks for consideration. 

dka...@gmail.com

unread,
Mar 16, 2021, 12:01:40 AM3/16/21
to LCZero
Not sure if you’re familiar with my work with Ender v1 and v2, two endgame nets.

Dariouch Babaï

unread,
Mar 16, 2021, 3:16:59 AM3/16/21
to lcz...@googlegroups.com
Last spring, I think, upon yours suggestion I did. and I also had good exchanges with oscardsmith on Discord, about related experiments, and that is exactly why I think TB land is the good place for many of my curiosity and ideas about data analysis of a range of performance regime not only the data from net-to-net competition measures* that is at the top, or in TB SL getting 100% accuracy. no information in that neighborhood alone for the response surface between complexity, net size (various dims), or database sampling parameters about coverage and density or mesh.  one needs the bad stuff to learn how to get to the good stuff. Isn't that a lesson, that seems scale invariant.  (lc0 had to do that, bard gyals too, and now even I want bad engines).

I will have another look in that spirit.  I explain in another draft email why i insist on making sure RL is possible in TB land. I will send it after reviewing more of everybody's replies to me, including yours, giving my self of the order of 1,2,3 weeks to a month, me slow.  And most likely I will first send it via personal email for bouncing back here with more concise and prune comments.

* Side note or suggest off topic (or is it?): competitions with typical time and hardware constraints, while i can think of others constraints than could be interesting.  Why still looking for the fastest engine, what if fast but biased?  could we even tell? What is the point of engine competitions nowadays.  A betting opportunity? (how ironic, some people like chess as not a betting game, but that is within-game).
What if other engine behavior on the chess complete game tree (node as position), or all the game trees from any legal positions as root, could be found to have aspect to compare and improve.  For example, per position evaluation error.  (policy too, but the better the evaluation the less policy is needed, right, everybody agree on that, at least it seems to have be the evolution with bigger nets, anybody, correct me, or let's discuss that aspect ).  What if there was no time clock  control, and only the number of nodes considered in the tree search were the constraint.  not fair.  ok.  well. how about, using only if the constraint were on the number of nodes being evaluated, however big or different the whole sub-trees are... yes unven time. that would mean that only the evaluation parts are competing. and perhaps some day hardware will be like our wet brains.  already computing in parallel, no bus funnelling for much of the work, and the sequencing of what our brains do very fast would not have to be so slow (because they have to be put in a narrow sequence).  It all depends on what is the purpose of engine competitions nowadays. what does best engine mean?
--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/HnZ004HssWY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages