"boosting" endgames in leela training? Simple change to learning process proposed: "forked" training games.

Skip to first unread message

Warren D Smith

Jan 2, 2019, 11:56:59 PM1/2/19
to LCZero
Leela has a PROBLEM with weakness in endgames.  
One idea to overcome that was "ender",
a new neural net artificially trained purely on endgame positions 
(extracted from human games, apparently, but then leela plays out the endgame from then on with
assistance from tablebases?)
which would be used with <=15 men on the board -- an arbitrarily-human-chosen threshhold based
on a human-chosen feature (# men) -- otherwise use the main net.

That would be a bit annoying since it in several ways disobeys the "zero" philosophy.

But a different idea, which would not need any human games as input, is
simply to "boost" the training of the main net for endgame positions.
The underlying argument (which can be made, and plausibly even is true) 
is that endgames matter MORE than middlegames and openings in the sense that if leela is
no good at endgames, then it cannot learn middlegames
and openings well either, since all knowledge during learning flows backward from game-ends.

That is ok if leela is only about learning how to checkmate in the middlegame and never reach 
an endgame, but against strong opponents that often is not possible and then you instead 
need to play the middlegame aiming to reach a winning endgame, and you need to 
know which endgames win and how to win them.

So what do I mean by "boost"?
Actually I am not sure whether "boosting" is the correct technical term for my proposal.
But anyhow here is the idea.

Either (a) increase the percentage of endgame positions that arise during training, or 
(b) increase the learning rate in a manner that is game-stage dependent (faster learn rate in 
endgame positions).
  Here (b) seems again not "zero" since a human is setting up the notion of what "game stage" means
(e.g. # men remaining on board).  

But (a) could still happily  obey the "zero" philosophy.

A way to accomplish (a) is to "fork" training games into two games (more forking then is possible
for a daughter branch and so on...) with some small chance P, e.g. P=0.03, each move you play.   That is,
in a normal game, you at your turn play one move, the one you think best.  But if we decide to fork, you  
play 2 moves, e.g. the 2 you think best, and continue on from there playing two games from then on.
In this way, games that last longer yield a greater number of training positions, and those positions
are naturally enriched in "endgame" positions.  A game with forking really is a tree rather than a linear progression.

(By the way, any training position that happens to lie inside a tablebase should use tablebase data to aid the train.
Which in no way violates "zero" philosophy.)

You have to decide which positions will actually be used to train the net, e.g. if it were 1 per game, it
would need to become 1 per fork-branch...  perhaps demand it occur at least 5 ply after the forking?

There pretty much is only 1 hyperparameter going on here, that is P.   You want to choose P
to yield the best rate of elo increase with months of training time.  To prevent occasional exponential explosions 
causing damage you might want to insist that, e.g, forking is turned off in any game branch if 7 forks already occurred.

Leela's present training method involves P=0, which I claim was totally arbitrary.

I merely am pointing out that it is plausible that the best value of P might actually not be 0, 
but rather something positive.
This would be a very easy change to make, and you might find with the optimal value of P that you get
considerably better learning than with P=0.

Warren D Smith

Jan 3, 2019, 12:11:52 AM1/3/19
to LCZero
Another way to get the effect of forked-training-games but without actually needing to fork them, is
simply to make the probability that a game position from a training game is used to train the net, be
a FUNCTION of the number of moves played before and after it in that game.

Right now, that sampling probability is UNIFORM.
I am simply suggesting that better results, i.e. faster elo increase rate vs months of train time
might be achieved by some NONUNIFORM function.

E.g, sample with chance proportional to exp(-K*(#moves from here until game-end))
for some small K>0 which are hyperparameters chosen to get best learn rate.

My point: The present choice K=0 might not be the best.
Some K>0 might work better.

This forking-without-forking way to do it is simpler and avoids the need for  artificial safety limits 
and avoids worries that you might get some pairs of too-related training positions.

It seems entirely possible that some sort of forking or pseudo-forking train scheme might, say, double
the effective learn rate.  I just made that up, but it seems entirely possible.

asterix aster

Jan 3, 2019, 1:32:27 AM1/3/19
to LCZero

   Interesting idea! Definitely has merit for games like chess where the game initially starts from a single position, exponentially diverges and finally converges again towards the endgame. First idea seems more efficient in terms of maximizing compute resources. I believe, the factor P would definitely have to be less than 1/(average number of plies in a chess game), which is ~0.008 assuming a game lasts 120 plies on average. One way to achieve this would be to assign 50% of resources to generated positions and 50% to the initial position. Whenever the condition of forking with probability P is met, the user continues with the game, while adding a position to "the queue of generated positions" on the branch that was not take. This way, we ensure that we sample all the positions hence maximizing resources.

   Care needs to be taken that the position generations does not get into runaway split, for example, really long games can generate too many positions hence biasing the training data towards those type of positions. I think you should post this to discord and see what the devs think. Better still if you can prove this technique on some small "chess-like" game, it will be more likely to be accepted for testing.

Warren D Smith

Jan 3, 2019, 1:09:25 PM1/3/19
to LCZero
Even aside from everything in this thread, i.e. if you really believed in sampling exactly 1 random-uniform position per game for training, 
because you thought that was the "correct distribution" to learn from, actually that is WRONG because the actual usage of
the neural net when leela thinks while playing a game is NOT uniform.  In fact then the net is applied to somewhat crazier than-in-real-game
hypothetical positions; and also somewhat biased into the future versus real game positions; and also longer games cause
more uses of the net, so that it was wrong to sample 1 position per game *ignoring* game-length.
Really, you should have done something like sample every 50 moves, so that 100-move games will include 2 not 1 samples.

There are other possibilities...
We could try to preferentially sample positions where leela, while playing the training game, CHANGED ITS MIND
about the eval.  The idea of that would be:
   this is the sort of position where leela did not understand the situation, but is almost
able to understand it (since it did a few ply later).  I.e. this is the sort of position where learning more
would pay off, and also where it plausibly would be feasible to learn more.

Alexey Eromenko

Jan 3, 2019, 1:41:06 PM1/3/19
to LCZero
interesting idea indeed ! 

x I'm tc

Jan 3, 2019, 2:23:45 PM1/3/19
to LCZero
I really disagree with the notion that Leela "has a problem with endgames."  Indeed, I suspect quite the opposite: that it is exceptionally strong in endgames, because it gets much more practice with them.  That is to say, Leela is likely very, very good at openings, because they all start in the same place, and very, very good at endgames because there are many, many fewer of them than midgames.  Indeed, Leela is probably weakest in midgames.

But how can that be, if AB engines outperform Leela in Endgames?  Because depth is emphasized, rather than breadth, and NNs are good at breadth, while ABs are good at depth.  That is to say, an NN is best adopted when the number of plausible possibilities is great.  So, when I say "very, very good" I don't mean compared to other engines, but rather compared to perfect.  That is, I bet Leela has less room for improvement in the opening and endgame departments than it does in the midgame, where the sky is still the limit.

One way around this might be to have some small net that evaluates the board and modifies the MCTS in a way that favors depth over breadth as the game continues--that is, that more aggressively prunes trees by favoring deep variants with high average win rate at aggressively decreasing exploration of alternate moves.

Jesse Jordache

Jan 5, 2019, 9:05:53 AM1/5/19
to LCZero
But that always goes without saying: node for node, Leela is stronger - there are plenty of possible reasons why this is, but that's academic because every metric points involving comparative node-count bears this out.  That doesn't matter if a competing engine can drown Leela in nodes. In the endgame, nuances become less important, concrete evaluation is more important, and the number of available moves drops, making min/max engines reach depths of 50 ply within 5 minutes on a fast machine.

My personal experience is that stockfish is better at endgames.  Leela is much more likely to err in "only move" situations than MinMax engines.  I think this is sort of unavoidable given the nature of the two beasts, but that doesn't mean that Leela can't get better than it is.   The forking idea is interesting, and I don't see any violations of zero in it.

Dietrich Kappe

Jan 5, 2019, 11:33:56 AM1/5/19
to LCZero
Stockfish is generally better at endgames. If I can offer up the story of Ender: https://github.com/dkappe/leela-chess-weights/wiki/Endgame-Net

There are some concrete techniques there worth trying out. But many of them are nonzero. I’m currently, with some help from people donating their gpu time, training up a 160x10-se Ender, so we’ll see if that puts it over the edge.
Reply all
Reply to author
0 new messages