Train for endgames separately.

874 views
Skip to first unread message

Kostas Oreopoulos

unread,
Aug 17, 2018, 1:31:30 PM8/17/18
to LCZero
It is obvious that "eventually" LC0 would play endgames much better. But I think it is very hard to quantize that. It may take exponential time to happen.

It would be nice if LC0 was trained also in positions that have certain criteria to require a separate treatment.

The criteria is a big discussion.

First of all once such a separate NN is created, you can "merge" the two by creating an "or" port and a mini NN that implements that criterion. So everything would be on GPU and no CPU involved.

TB's should be used for "game termination". Think of it as an NN approximating the TB's. Besides, that is what NNs are.

Now, what would those criteria be? 

My own preference is piece count. Maybe 4 for each side.


Norton Freeman

unread,
Aug 17, 2018, 2:11:35 PM8/17/18
to LCZero
I think endgames takes a major part of chess,so the standard training already involves a lot of endgames.Just be patient.
Message has been deleted

Dietrich Kappe

unread,
Aug 17, 2018, 3:01:47 PM8/17/18
to LCZero

Some of the details about my experiments with Supervised learning with egtb.

The conclusion? Well, perfect play isn't good enough. You need some imperfection to get a good variety of positions and moves.

I'll let you know when my net (or any Leela net) can solve KQvKR.

Kostas Oreopoulos

unread,
Aug 17, 2018, 3:51:24 PM8/17/18
to LCZero
Very very very very very interesting!!!!

Michael Simkin

unread,
Aug 17, 2018, 7:36:24 PM8/17/18
to LCZero
NN philosophy is all about self learn. Instead of crafting by hand an artificial criteria, one should only decide on the architecture - there is an option to implement continuous or gate (i.e. LSTM), and the net will learn to use it in the most efficient way. 

Anyway the problem with leela endgame play is the data. The data is so noisy with so many mistakes, so leela thinks every endgame is won or lost. You will very rarely see leela gives 0 score to any endgame position. This problem comes in the middle game as well - leela has hard time to grasp the concept of a draw, as draw require data with accurate play, and leela has a lot of data of inaccurate self-play. 

This is why Deus is playing more aggressively and draws less, because he learned from real games, in which draws are much more common than in leela data. 

The problem with supervised learning is that it converges to local minimum. Google article clearly shows that learning from zero is better than self play after supervised pre-train. 

Kostas Oreopoulos

unread,
Aug 17, 2018, 8:18:28 PM8/17/18
to LCZero
This is correct, theoretically. But there are practical considerations.

Its like talking about the Nash equilibrium of existing, but calculating it is not practical.

This is why approximations exist.

In our case, the approximation might be to cluster data and solve the problem for each cluster. It might be suboptimal but also doable.

As Dietrich pointed above, solving KQvKR, is an interesting task. Knowing its winning is one thing, moving closer to the end is another.

It would be interesting to create an NN that just solves that. Its another thing knowing that something wins (that surely a NN can do) and another find patterns that make that win happen (maybe those patterns do not exist and its pure calculation)

The current LC0 (I do not know about A0) is extremely weak in categorizing endgames (for example we humans "know" after seeing many thousand games and analyzing hours that opposite coloured bishop endgames are most likely drawn. LC0 understands that in the middlegame the opposite coloured bishops are good for the attacker, but knows nothing about the endgame).

Maybe some ideas from GAN's can be used to improve.

Khaki Menez

unread,
Aug 17, 2018, 8:24:59 PM8/17/18
to LCZero
As I said it's not zero in leelazeroif you used TB..

Michael Simkin

unread,
Aug 17, 2018, 9:00:23 PM8/17/18
to LCZero
TB will not help that much - the problem is the data (leela data has very little drawing games). If all your life you would see imperfect play, you will have very hard time to learn how to draw, you will think someone is always winning. Leela needs to learn to play drawing endgames (with many pieces) and learn to distinguish them from won ones. 

This is all basically comes to generate better self play games. Unfortunately current strategy of self play comes from general game theory, that explores a lot of options by doing bad moves often. You should see self play games to understand what I'm talking about. 

There is currently no theory of how to generate self play games to learn drawish endgames. 

Maybe we can have "soft-tree" architecture, and give leela both self-play and ccrl games as well. Basically leela will have two NNs one from self play and the other from ccrl. But playing that much with architecture and data can give unexpected results. I would suggest to think how can we generate "noisy" drawish self play. Maybe we can use some engine that evaluates drawish moves, and teach those to leela. 

My basic claim is that we need a special case to teach leela the concept of draw, and better doing it by generating data by self play, or at least with minimal intervention. TB will teach leela a lot of special cases, it's not what causing leela to give +3 in completely drawish endgame. 

Brandon Harris

unread,
Aug 17, 2018, 9:10:55 PM8/17/18
to LCZero
Just a random idea.. which I think may skirt the 'zero' philosophy, but what a 2nd network were created with the goal not to win, but to reach equilibrium / draw.. basically to prevent any advantage. This would be zero-based, it would learn from itself.. and then leela and this second network would play and learn from each other. Is it still 'zero' if two unsupervised processes learn from each other? I don't know.. it's late.. just a thought.

Jhor Vi

unread,
Aug 17, 2018, 11:04:50 PM8/17/18
to LCZero
Training with TB is still a zero approach because our new goal is now reaching the TB position.

Gokhan M

unread,
Aug 18, 2018, 1:03:50 AM8/18/18
to LCZero
> Training with TB is still a zero approach because our new goal is now reaching the TB position.

^^^
True

Remember TB isn't derived from human games. It's computer generated (mini-max optimal solution of chess). 

Using TB endgame conclusions shouldn't corrupt any of the logic in L0. Rather they would strengthen (reducing the training noise, speeding up training). 

As mentioned, a second NN will be necessary once you hit one of TB positions. But that can be taken care of separately. 

GM




Gokhan M

unread,
Aug 18, 2018, 1:12:58 AM8/18/18
to LCZero
Also remember that DeepMind's motivation was very different when they built AlphaZero (writing papers & coming up with research breakthroughs). 

They aren't motivated to build the best chess engine possible (using TB and what not). 

So we should take their example for what it is (as a major advance, but not necessarily as the optimal solution). 

Michael Simkin

unread,
Aug 18, 2018, 4:08:07 AM8/18/18
to LCZero
Network that seeks to reach draw is a good idea - but the problem is how to generate data for such network? In chess when you make small mistake you many times lose the draw. When you generate self play, you must explore not known not tested moves to understand where they lead, this is basic game theory (if you don't explore not known moves, you will get stuck in evaluating position you used to with self play - i.e. you will play great against yourself but will not understand chess positions). The only reason self play works is because in the training data there is plenty of random moves which are averaged out by the NN. But noisy data is very not drawish data. 

So maybe we can train a network to draw using existing games (with supervised learning). When it learns to draw properly with existing data, we will add noise only to lc0 moves but not to this net. The problem here is that lc0 will learn that no matter how bad her move was it will end in a draw. I still think we just need to use some good engine that will make random moves which end up in a draw with very high probability, and use those games to teach leela to draw. 

As I mentioned TB is not the issue in my opinion - recognizing drawish position is, I guess A0 also didn't had a clue how to do this - it knew to win the middle game. You can always attach TB to any engine and it will give you some 50 elo points. LC0 doesn't understand draws as in its training data draws are rare. 

Nico van Dijk

unread,
Aug 18, 2018, 4:42:00 AM8/18/18
to LCZero
I think that the idea posted in

https://groups.google.com/forum/m/?utm_medium=email&utm_source=footer#!msg/lczero/pZ2dGmwQrW8/AhL9W2xnCQAJ

would help.

For the value head training of a specific position one uses the average of the training game result and the evaluation of the MCTS search. In that way one reduces the noise caused by blunders, but also gives leela the chance to understand that she cannot make progress against accurate play.

Michael Simkin

unread,
Aug 18, 2018, 6:53:53 AM8/18/18
to LCZero
@Nico Definitely! using q instead of z is a possible solution. My only addition is that currently this is what causing lc0 to make way too much draws thinking it's winning i.e. this is the most crucial problem lc0 currently has and without solving it, lc0 will not be able to play endgames properly, because its assessment of the position will many times be over-optimistic (especially for drawish endgames where small mistakes can bring about a loss - but good engines never do them). Basically leela is now assessing the probability of winning assuming mistake will be made. This assumption doesn't work against strong engines, and leela is incapable to properly assessing a position without this assumption as it's hidden in the training data, many times moving from middlegame toward endgame into those drawish endgames thinking they're winning. 

I really hope leela developers will notice this nuance. Their only problem is that they want to follow a known path, it looks like this path is well explored by some teams, so basically lc0 developers can switch to it without a fear of being alone in a deep water. One should also remember A0 drew a lot of games as well, maybe they simply didn't care so much about this particular feature. 

Anyone who followed TCEC div 3, could see this pattern - lc0 has +2 or +3 in an a completely drawish endgame, she traded pieces in a way better middlegame thinking she moves to a completely won endgame which finishes in a draw. 

Kostas Oreopoulos

unread,
Aug 18, 2018, 7:32:53 AM8/18/18
to LCZero
I think it is not just noisy data that do not help here

Imagine you have a drawish position. For example, an Opposite Colored bishop endgame with few pawns.

In every position, a blunder can be made and the result would be either 0-1, or 1-0.
The problem is that this can go on for 1000000 moves (until all pawn moves are exhausted) before drawing is inevitable.
In every other case, blunders are possible so there is no way the engine can learn that. It is basically a horizon effect

This is a problem that does not exist in GO.

We need to add a factor to the update state equation to penalize no progress.
There is the 50 move rule, but this makes no sense for evaluation when there are many pawns around. It would take hundreds of moves to make any sense.

Nico van Dijk

unread,
Aug 18, 2018, 7:58:32 AM8/18/18
to LCZero
I believe my suggestion would not have problems with that. The MCTS search (close to the 50 move rule) would give a draw value (did you notice that during TCEC the evaluation of leela slowly drops when the 50th move approaches).

Michael Simkin

unread,
Aug 18, 2018, 8:08:19 AM8/18/18
to LCZero
Let's start with the fact that 1000000 moves is not an option in chess. 50 moves rule dictate a pawn move will be made every 50 moves, considering there are 16 pawns and they have max 6 moves it all gives us 6 * 16 * 50 = 4800 moves the longest possible chess game. Yes it's a lot but it's not a milion. I guess that on average there would be less than 250 moves until draw would be reached, as no side will want to make a pawn move at all and prefer a draw (i.e. the score will be less than 0 if a pawn move will be made). 

I was thinking about the q vs z problem. We can see that currently the q value is also problematic not only during the training but also while playing, leela actually thinks a drawish ending is winning giving it +3 as in all continuations it also estimated as +3. So we can hope q will help but it could also give similar results as the current q is based on current z i.e. the concept of drawing will not be learned by using q as it's based on many non drawing games in the first place. 

I was thinking of another idea: during the training, in the endgame starting from some random position we can turn off the monkey moves. I think that playing the best moves starting from some random position in the late moves, is not worse than with the same probability making a monkey move that losses immediately. This will allow to estimate better the position and have more reliable z value, but what's more important is that it will teach leela the concept of draw, as many endings with perfect play will end in a draw, this will reduce many of the +3 stuff, reducing it to 0. 

Michael Simkin

unread,
Aug 18, 2018, 8:12:46 AM8/18/18
to LCZero
@Nico Yes leela knows that approaching move 50 makes the situation more drawish. This doesn't mean leela will know the same thing on move 1. I can give you plenty of examples where leela for 40 moves thinks it's +3 in a complete draw. How q will help in those cases? 

Kostas Oreopoulos

unread,
Aug 18, 2018, 8:43:07 AM8/18/18
to LCZero

The 1000000 was just typing a big number. I was not quantifying (i guess it was not obvious :) )

The problem is the following. Let's agree that the horizon is 250 moves. It does not matter.

We have   
  a) A big branching factor of moves that do not blunder
  b) The drawing endpoints would be the leaves of that huge tree

The percent of drawing lines is very very small and you need to see the whole picture (tree) to extract the pattern which is not possible at all.

So yes, I agree with you.  Exploration should be a parameter of the position. In very simple endings it should probably be zero. It is part of experimentation.

Michael Simkin

unread,
Aug 18, 2018, 8:56:37 AM8/18/18
to LCZero
My claim is on the contrary - we should balance exploration and accuracy. Currently exploration is coming always before accuracy (which is achieved by averaging the exploration), making a lot of random moves. This indeed keeps leela from converging into estimating only some type of positions, and understand much better the nature of mistakes. But it also assumes mistakes, this by its own is a mistaken assumption. To stop assuming mistakes we need to reduce exploration in order to gain accuracy i.e. even in a complex positions we should sometimes reduce exploration in order to increase accuracy. 

There is also another point: in the opening and middle game the value of exploration is tremendous, but in the endgame it's many times something which is called technique. With low amount of pieces the game of chess becomes much more predictable so the value of exploration in the endgame by the nature of this stage is much less. I would simply stop exploring at some random stage after move 25 for example, or at some random place in the endgame. I don't see how it can harm leela, but yet again this is uncharted territory. 

Kostas Oreopoulos

unread,
Aug 18, 2018, 9:02:38 AM8/18/18
to LCZero
Sorry but you misread my post. We agree

As I said above "Exploration should be a parameter of the position. In very simple endings it should probably be zero" which is exactly the same as what you write "To stop assuming mistakes we need to reduce exploration in order to gain accuracy"

So depending on the position we should prefer accuracy over exploration. So in basic endings zero exploration ==> best accuracy.

Nico van Dijk

unread,
Aug 18, 2018, 9:04:34 AM8/18/18
to LCZero
@MichaelSimkin
It will slowly be learnt starting from positions that occur near the 50th move. If in the training game a blunder is played early on in a drawn position, it would not help, but it would later on. And in any case, it will reduce the negative effect of blunders. I think it's worth trying.

Michael Simkin

unread,
Aug 18, 2018, 9:09:03 AM8/18/18
to LCZero
We almost agree. My claim is that we should reduce exploration even in complex positions from time to time. My reason is that we don't really know how complex the position is, so randomly removing all exploration will be averaged out eventually. Another claim is that to have z value better correlate with the actual state of the position, we should stop exploration at random point of the game, thus we will get much more draws and this will be more representative. 

Kostas Oreopoulos

unread,
Aug 18, 2018, 9:12:29 AM8/18/18
to LCZero
We agree :)

"My reason is that we don't really know how complex the position is". Exactly. This is why I wrote, "Exploration should be a parameter of the position". Because we do not know how complex the position is.

Maybe it could be a trained parameter. 

Michael Simkin

unread,
Aug 18, 2018, 9:16:51 AM8/18/18
to LCZero
@Nico Just saying I'm not convinced that it will work - because we're using heuristics, it might happen that the q value will still remain high, i.e. the error of the net will prefer to make small mistakes to represent the z value, instead. For example it will think that on move 40 it's +1 on move 30 it's +2 and move 20 it's +3, so yes it will not wait 50 moves and will see what's is coming, yet still the basic draw fallacy will remain, as it doesn't see the draw because in the training games draws are not common, even when it learns the q value, the q value is only as good as the training games themselves. I would still try it out - but my claim is that adding many accurate endings to the game will help much more, as it will solve the draw fallacy. 

Michael Simkin

unread,
Aug 18, 2018, 9:22:45 AM8/18/18
to LCZero
Training the desired noise of the position it's an interesting idea, but I think it's a complex one and unclear how to estimate it - what is the criteria of correct exploration/accuracy balance?  Just stopping any exploration at random point of the game is much simpler one to implement, yes it's less scientific of course. My claim is that it's good enough to solve the current major issue, and I guess it's less than 5 lines of code. 

Khaki Menez

unread,
Aug 19, 2018, 4:29:20 AM8/19/18
to LCZero
Well I prefer not to used tablebase because Leela should learn from opening to endgame without intervention.

Tom Bodenmann

unread,
Aug 19, 2018, 9:28:15 AM8/19/18
to LCZero
In my view using Leela to explore endgames does not make any sense. These TB endgames are already solved. Why waste resources to explore something thats already known? Use the TB!
The engines in TCEC work between starting position and hitting TB position. Give Leela a fair playing ground and let her use TB as well. She loses many games simply because she is not allowed to use existing facts (TB solutions)

Dietrich Kappe

unread,
Aug 19, 2018, 9:41:03 AM8/19/18
to LCZero
Sure, if you were stopping at 6 man it’s only an interesting academic exercise.

I’m training a specialist 12 man endgame 64x6 net (called “Ender”) so i can eventually run a 3 net 64x6 on cpu with each specializing in a different phase of the game. After batch 5 of adversarial play endgames against sf9tb, it is already out scoring 192x15 ID512 on a 149 position endgame test suite 50% to 40%. Next stop: KQvKR.

Alexander

unread,
Aug 19, 2018, 10:46:33 AM8/19/18
to LCZero
Wait a minute... don't we have testserver and mainserver (which of course seems not so vivid)?
Can't we split all these ideas with training on TB and etc to test (or main) server while main net (with zero approach i.e. no TB and so on) to be trained on main (or test) server?

or the devs don't really want to experiment like this?

it seems that we almost close to SF9 level and the plateu effect in strenght gaining will prevail, and now IMHO the target is not to replicate A0 but to build the strongest chess entity in the world.
Besides, we really don't know what A0 is. Or we just wait when L0 would beat SF8 28/100 in a match - that is the criterion?

I think, training in parallel of net with zero principle and net with semi-zero (like TB and tactics positions and games of SF and K and H) is for good.
Clashing these nets sometimes to see which net is stronger.

I also realize that L0 could understand everything by her own (like endgames and tactics and etc) but we may get older for a year or 2 or 10... ^_^

frakty

unread,
Aug 19, 2018, 11:52:31 AM8/19/18
to LCZero
Imo stopping the exploration completely might be risky because then the net may stop learning new ideas.
I was thinking that perhaps simple yet potentially working solution could be to first define simple criteria when game is in technical endgame phase (e.g. low material count / number of pieces - the type of position which A/B engines can play basically perfectly with TB and lc0 currently completely miss-evaluates)
and then have certain small percentage of games (like 20% for example) have exploration randomly turned off once "technical endgame" stage is reached while the remaining ~80% would work as previously (with exploration) and the NN would train on both datasets. So I am wondering if doing it like this could help the net to converge faster to the precise (or at least much more precise) endgame play.

Tom Idermark

unread,
Aug 19, 2018, 12:19:09 PM8/19/18
to Alexander, LCZero
The goal of this great endeavour probably needs to be defined a bit more in detail. The ”zero” approach is great (even necessary) but can mean too many things: E.g. is having an ”attacked pieces” input plane disqualified? Or an attacking pieces plane, or a pinned pieces plane etc.? None of these (just as a EGTB) adds any human heuristics or ”knowledge” into the game (it is just unbiased facts, just as the rules of the game are) but it _may_ make the NN extract important positional features ”earlier” in the network and to further down better exploit those. Would be really nice to hear the developers view on this, and also on MCT Search improvement possibilities ...

I, for one, have longed (since Deep Blue) to see an ”engine” playing chess that is positional strong, outsmarts the AB tactical focused engines that makes too many ”cryptical” moves and that can teach us intermediates (via KC and others) strong and fun chess. Losing due to not having an end game TB is just sad in my opinion. And may make instructive games never to be analysed and annotated. To be clear: an NN can likely learn an endgame well, probably just as well as an EGTB assisted engine can, given enough layers and weights but do we want the NN (at this time) to train there impeding good midgame play. Unless the network is enlarged to cope with both, someone needs to decide ...

BR,
Tom

PS What do you guys feel about the CCCC coming up on Aug. 31? 4xTesla V100 is _insane_ (at USD 10000 each). Is any LC0 version on small integer weights in development? That is probably a major undertaking but needed to make use of the V100 ...

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/dc6a1e12-e589-4544-9790-d4f9bc9679a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Whipp

unread,
Aug 19, 2018, 12:23:17 PM8/19/18
to frakty, LCZero
I see a bunch of folks advocating to change the training for endgame: feels like the case would be strengthened with some data/analyses; such as how many positions in a training batch are currently evaluatable using TB; and, of those examples, how many are actually incorrect according to TB?

If there's a problem with bad training data; then that's be a case for doing something with TB to correct the data (either filtering out the bad examples, or changing them to be correct). If the training data is good; but the NN isn't properly generalizing; then different solutions might be appropriate.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages