Does someone have any ideas how to improve the endgame play?

1,284 views
Skip to first unread message

123

unread,
Apr 19, 2020, 12:45:40 PM4/19/20
to LCZero
Does someone have any ideas how to improve the endgame play?

Maybe training with tablebases.
Or endgame rescalling.

Dietrich Kappe

unread,
Apr 19, 2020, 2:09:16 PM4/19/20
to LCZero
When you say “improve,” do you mean in terms of results or aesthetics? The results are quite good. The aesthetics are not, of course.

To solve the latter, you have some options:

1) use an ab Engine late in the game to drive a rapid conclusion. This is what Scorpio does when it gets down to 9 men.
2) use some small bonus eval to encourage pleasing moves, like material advantage and pawn pushes. I did a patch for the old lczero ending that resulted in semi-pleasing play.
3) the new mlh (moves left head) that is used to steer towards quicker results. This can have other benefits. T71 is using this, I think.

Again, endgame results are quite good already, the games are just ugly.
Message has been deleted

Christopher Burton

unread,
Apr 20, 2020, 6:07:45 AM4/20/20
to LCZero
I like the idea of combining reinforcement learning with a king of temporal based learning.  I haven't yet implement anything yet, but, my thoughts go like this.

Score a position with a 1 node search and then score that position searching normally to compare the results.  If the results are very different, there is a learning opportunity for the network.  The positions most likely to return a very different score would tend to be more tactical, sacrifices, checks, capturing sequences, etc.

This would combine reinforcement learning and a form of supervised learning.  The supervisor in this case is the result from the deep search.  This approach I believe would improve the accuracy of the network in general without search, and search further reduces network errors.  Another possibility is that it wouldn't really improve strength, it could simply improve the rate of learning.  Either way, it is near the top of my own testing list.

-Chris

ronnie millsap

unread,
Apr 20, 2020, 9:53:24 AM4/20/20
to LCZero
since they have things such as 'brainfish' which will use databases etc. i wish we would do a match were lc0 is for beginning, and middle game. Then, we could use the Ender(i believe this is the name) endgame Lc0 that was made by someone on this forum for the end game. I'd like this as a spectator. dunno what you guys think.

Brian Richardson

unread,
Apr 20, 2020, 11:19:39 AM4/20/20
to LCZero
Already being done; see:

M MUSTERMANN

unread,
Apr 22, 2020, 2:06:23 AM4/22/20
to LCZero
Christopher Burton:
LC0 needs to be improved to play much better endgames.
1. To give pieces for nothing is a very bad style. Even if the position is a draw it doesn't make sense to give all chances to win away. It loses elo.
2. LC0 doesn't really know how to win some won endgames.
3. LC0 doesn't really know ho to draw some endgames.
4. Also Stockfish is much better in endgames.
=Tablebases must be used in training.
Or automatically check all trained games with 7 pieces tablebases online and change the results automatically according to the tablebases.

glbchess64

unread,
Apr 22, 2020, 7:01:22 AM4/22/20
to LCZero
I think all 4 points are false or partially false.

It is a trite to say that lc0 has a bad play in endgame. It was true before T40. But now, the nets have a really good endgame play. But as D. Kappe explained the style of play is ugly.

  1. lc0 gives pieces in endgame (but not only in endgame) when she thinks it does not change the result or it will give a better result so that it can not have the consequence to lower the elo : the elo is calculated with points at end of game, not with material.
  2. lc0 does not know how to win some won endgames but also SF her best challenger.
  3. idem.
  4. It seems finally that lc0 is better than SF in endgame !
The last point is a surprising new discovery and it needs more investigations. Everybody thought that Lc0 was better in opening and middle game than SF and worse in endgame. But nobody experimented on that until @chad created a large corpus of positions from human play.
He creates 14 books : 6men, 8men, 10men, ...32men.
The first experiment was SV-3010 1-node against SF, and the result was a chock for all of us :

endgames-1node.png


So, 3010 is better in opening and endgame and worse in middle game ! More precisely, lc0 is worse after 5 or 6 swaps (10 or 12 pieces less). How to explain this ? The new explanation we have is that after 5 or 6 swaps lines are opened and the queen often remains on the board. And nets have difficulties with queen tactical play because there is a lot of combinations where the queen play anti-positional moves to win pieces or threat the king. We know that the real problem in endgame with Leela are with the queen on board. This is rare so not so big a problem in practice (but in the SuFi she will had likely +10 instead of +5 without this issue most of the games she blunder implies queen decisive move).

I started a new test run with 1K nodes per move for SV-3010 and 1M nodes for SF. I played the 6men to 16men positions, that take a lot of time since there is between 2K to 20K games to play for each book. I have exactly the same trend with a compressed curve. 17 elo loss instead of 40 in @chad experiment. More experiments are needed. It may also be a bias from the book and it worth to experiment at higher nodes.

Now I am experimenting with a new net : SV-3200, known to have endgame issues.

There is always a danger in generalising few observations. A scientific study with good model and statistic is always better. And it is also important to question the model (do I measure what I think ? is the sample biased ?) and the way to do the statistics (are the sample big enough ? what is the random part ?).

I did not say that all is wrong in previous post but the reality is far more complex that what is explained in that post.

Charles Roberson

unread,
Apr 22, 2020, 11:41:48 AM4/22/20
to M MUSTERMANN, lcz...@googlegroups.com
Every chess player above 1700 Elo knows that the type of thinking for the endgame is rather different than that of the opening and middlegame. I suggest having a second neural net just for endgames. It would be quite easy to decide when to switch from the primary network to the endgame network.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/8fbaae29-fdd6-47ee-bbdc-845dae176a50%40googlegroups.com.

ronnie millsap

unread,
Apr 22, 2020, 12:04:43 PM4/22/20
to LCZero
Thank you! Have there been anny matches between brain fish and a0lite?

Warren D Smith

unread,
Apr 22, 2020, 1:40:24 PM4/22/20
to glbchess64, LCZero
SF with tablebases plays 6 man endings 100% perfectly, so the graph showing Lc0
is "better" than SF with 6 men, seems nonsense.

In the TCEC game where Lc0 gave up its queen, technically this was not a mistake
(assuming it is correct it was a draw both before & after). But, if Lc0 were
playing a fallible human like me, then it gave up all hope of getting any
more than a draw. If it had kept its queen, I think it would have had
reasonable chances to win. I could be wrong, but it looked that way.

So that may not be an "endgame" flaw, but rather an "assuming your
opponent is perfect" flaw. Better is to assume your opponent might err.
I.e. if you have a choice of 2 equally good moves to play versus a
perfect opponent,
but one of them is far superior against an imperfect opponent, pick it.

How can Lc0 learn to do that? Earlier I had posted a way: train using
time-odds matches so Lc0 gets experience playing not only versus equal-strength
opponent, but also vs weaker & stronger ones. Even better: add extra input to
the neural net saying the relative clock time left for both Lc0 and
its opponent.
This extra input also could be used as a "contempt" setting later.

--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)

glbchess64

unread,
Apr 22, 2020, 1:55:14 PM4/22/20
to LCZero
I would have precise that it would be a non sense to experiment on endgame with TB so all these experiments are without TB and without adjudication else than mate, 3 fold, stalemate, insufficient material. In practice TB give just about +10 elo to LC0 so it is not so important (she just play better endgame with queen and few pieces, but these endgames are rare in practice).

Warren D Smith

unread,
Apr 22, 2020, 3:04:55 PM4/22/20
to Guillaume Le Blanc, LCZero
Another question is: does "single node performance" mean much? Maybe Lc0's
problem with endgames (if it has one) is due to it being a poorer searcher
in endgames than it is in openings?

On 4/22/20, Guillaume Le Blanc <glbch...@gmail.com> wrote:
> This argument can be valid for engames with few pieces but it does not
> explain the curve shape that increase in a linear way from 22 men to 6 men.
> But your argument is valid in general not only for endgame : this is THE
> big problem of SF there are so few chess concepts in the eval function that
> it can not resist to NN. NN make few tactical mistakes less than one per
> game, SF make a lot of positional mistakes : several per games.
>
> Le mer. 22 avr. 2020 à 20:02, Warren D Smith <warre...@gmail.com> a
> écrit :
>
>> Well, stockfish intentionally left out a lot of endgame knowledge because
>> they
>> had tablebases.
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "LCZero" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to lczero+un...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/lczero/53df8609-885c-4b9a-9c10-56d4efabf078%40googlegroups.com

glbchess64

unread,
Apr 22, 2020, 3:43:29 PM4/22/20
to LCZero
This is a very interesting and difficult question. Experiment show that nets scale not in a linear way. Adding more nodes to the tree can make a big difference. And this is the reason why a wrote that more experiments are needed with more nodes per move and also at TC. 1 node and 1000 nodes are few nodes.

In fact, endgame play is one of the major concern of the dev team. But the main ideas are to have a play that looks less horrible because most people think that since the play is horrible it is inefficient (and we can see that in this thread). That will be even better if we can have a better efficiency. Previous attempt to have an endgame play that looks better have failed (before T40 if I remember) and caused a loss of efficiency.

Dave Whipp

unread,
Apr 22, 2020, 4:44:19 PM4/22/20
to glbchess64, LCZero
Also possible that perceived weakness is a time management problem: with insufficient time at end of game, quality of play suffers.

On Wed, Apr 22, 2020 at 12:43 PM glbchess64 <glbch...@gmail.com> wrote:
This is a very interesting and difficult question. Experiment show that nets scale not in a linear way. Adding more nodes to the tree can make a big difference. And this is the reason why a wrote that more experiments are needed with more nodes per move and also at TC. 1 node and 1000 nodes are few nodes.

In fact, endgame play is one of the major concern of the dev team. But the main ideas are to have a play that looks less horrible because most people think that since the play is horrible it is inefficient (and we can see that in this thread). That will be even better if we can have a better efficiency. Previous attempt to have an endgame play that looks better have failed (before T40 if I remember) and caused a loss of efficiency.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

M MUSTERMANN

unread,
Apr 22, 2020, 6:20:28 PM4/22/20
to LCZero
glbchess64:
This is a very interesting and difficult question. Experiment show that nets scale not in a linear way. Adding more nodes to the tree can make a big difference. And this is the reason why a wrote that more experiments are needed with more nodes per move and also at TC. 1 node and 1000 nodes are few nodes.

In fact, endgame play is one of the major concern of the dev team. But the main ideas are to have a play that looks less horrible because most people think that since the play is horrible it is inefficient (and we can see that in this thread). That will be even better if we can have a better efficiency. Previous attempt to have an endgame play that looks better have failed (before T40 if I remember) and caused a loss of efficiency.

For me it is completely uninteresting and useless to see experiments at 1 node or 1000 nodes.
It's like using 0.0001% power of only 1 cpu core instead of all 64 cores at 100% power.

Also:
1. I have much more nodes at 1 second using the 40x512 nets.
2. I have to much more nodes at 1 second using little 10x128 (Chess 960, training run 3, 712576) nets. I got here 125 kn in the first second = 125000 nodes. That's why 1 or 1000 nodes to experiment are more or less completely useless, in lots of different ways.
It's like a 1 month old baby which wants to beat Albert Einstein.
After 1 minute I have only 85 kn/s.

glbchess64

unread,
Apr 22, 2020, 10:04:49 PM4/22/20
to LCZero
@M MUSTERMANN : do you want to run some tests at high nodes ? (you can find the books on discord #test-results, it is in a @chad's pined post and if you ask him he can give help to configure cutechess-CLI the right way)

We have some good results at very high node at TCEC and CCC where almost all victories and draws are obtained after endgame play. But not enough games to draw conclusion and all the games start with a lot of pieces on the board. As it is not possible to run all positions at high nodes, you can select random ones from the books.

Dietrich Kappe

unread,
Apr 23, 2020, 1:53:46 PM4/23/20
to LCZero
Hmmm, a specialized endgame net would be fascinating. :-)

https://github.com/dkappe/leela-chess-weights/wiki/Endgame-Net

I also have newer v2 Enders. With 6 man TB they are slightly worse than sf11 at a suite of imbalanced 16 man positions (48.5%, played twice with colors reversed).

Scorpio has used Medium Ender (192x16) for a while now, switching at 14 men, but I haven’t released them for lc0, as it doesn’t yet support multiple nets.

There does appear to be an issue with tactics from time to time. Maybe this has to do with it never seeing positions of more than 18 men.

Here, for your enjoyment, an example where Little Ender (128x10) beats sf and hold the reverse to a draw. https://lichess.org/study/eOhEALIP

Jack Lo

unread,
Apr 24, 2020, 6:36:35 AM4/24/20
to LCZero

1) You can use the SF11 at the end of the game when training a network. For example, if there are few pieces on the board, check how the position is evaluated by SF11.
Sometimes in LC0-SF matches it happens that a position is won for LC0, then LC0 makes a weak move at the end and the game ends in a draw. It leads to mistakes in training. The network learns that the position is tied while it is not.

2) Additionally
You might think to change LC0 in such a way that at the end of the game it switches to an alpha-beta algorithm (without a network).

3)
The AB algorithm could work simultaneously with the network. Use the CPU power to find tactical moves while the neural network runs on the GPU.

glbchess64

unread,
Apr 24, 2020, 8:18:57 AM4/24/20
to LCZero
  1. Stein net (13 and 14) is train with games reevaluated by SF, that generalise point 1 to middle game and opening and this is good but don't make miracles.
  2. It is LeelaFish : Results are not very good.
  3. Interesting but nobody know how to integrate the two parts. Simple, blunder checker is not enough.

Jack Lo

unread,
Apr 24, 2020, 11:58:23 AM4/24/20
to LCZero
2. In fact, Leela is not very good. The way of integration is not effective.

3. https://github.com/feldi/py-goratschin. This way seems better to me. But the project needs to be perfected.

The simple rule:
a) It seems to me that if LC0 would switch to AB when the number of pieces exceeds a given threshold, it would be a stronger engine than the classic LC0.
b) The goratschin rules are also simple and logical. The centipawns should be properly chosen. This will allow LC0 to avoid tactical combinations carried out by opponents. Lc0 will also be able to attack with combinations.

It's for a start, and then you can create better and more complicated rules.

Dietrich Kappe

unread,
Apr 24, 2020, 1:56:19 PM4/24/20
to LCZero
Jack,

These ideas have been pondered and tried, in a variety of forms. See Scorpio for ab Engine at 9 pieces. See this idea from back in the buggy days where I tried real-time blunder checking: https://github.com/dkappe/leela-chess-weights/wiki/Real-Time-Blunder-Checking

Also, people like to say that leela isn’t good at the endgames, when they really mean it plays in a very unattractive way. I know of only a few places where leela has really failed in the endgame, and not recently.

One of my favorite test positions is this one from a game of Scorpio against Komodo in CCCC, where Scorpio played Kxh4??. It turned out only some special nets dealt with this well: Leelenstein and Dark Queen.

8/2n5/6p1/P2Pk3/6Kp/7P/6P1/8 w - - 0 1

Last, an example of Little Ender, a 128x10 special endgame net, outplaying SFDev (pre sf11) on both sides of a 16 man endgame. https://lichess.org/study/eOhEALIP

glbchess64

unread,
Apr 24, 2020, 2:36:41 PM4/24/20
to LCZero
Difficult to say that Leela is not very good when she wins tournament. And there is not only the result at TCEC, there is the way she obtains this result dominating almost all games. Several win and draw missed. For SF it is not possible to say the same thing : even if SF team invokes a bug that would appear only with many threads. In fact if SF helped sometimes Leela to win this seems because it did not evaluate correctly the position (emphasis on material vs structure) or likely because of horizon effect : pawn sac under heavy pressure.

The Staunton gambit played and won at CCC (see the video in another thread) by Leela show that Leela may be a better tactician than SF and a better positional player and show the inefficiency of AB. The PUCT algorithm allow an heavy pruning of the search tree and show its efficiency when driven by a NN. The AB pruning is far less efficient and SF spoil its time to calculate variations that have no value. When you see some anti positional moves played after several minutes calculating with 176 threads you know that there is a big problem.

With AB at TCEC SF calculated billions of position per move to play a lot of positional blunders that a medium club player can see immediately ! There is a lot to do for Leela to progress but it seems that AB is not the solution. Only selective search and knowledge. And likely new algorithms, who knows ?

Blunder checker with AB for NN had been tried, this does not work, this marginally ameliorate the engine there is other way to get better progresses.

This is very difficult to chose between the AB move and the NN move when they are not OK. Look at TCEC games, look at the evaluations, there is a lot of positions where the evaluations are very different and only 10 moves (sometimes more) later you learn which one was good. Goratschin rules seems very naive.

Jack Lo

unread,
Apr 25, 2020, 7:47:17 AM4/25/20
to LCZero
"2. In fact, Leela is not very good. The way of integration is not effective."

Little fix: I was thinking of Leelafish, not LC0.

-----------------------

Below 21 positions that SF solves in one minute and LC0 does not (on GTX card). The first 10 are tactics and the next 11 are endgame.Please check if faster graphics cards and new neural networks can pass this test.

r4rnq/3b2k1/1p2p1p1/pP1pPpNp/5P1Q/2PB3R/1P4PP/6RK w - - bm g4;
2r1r1k1/p2q1pbp/1p4p1/3pPP2/2nP2N1/3Q3R/P5PP/R1B3K1 w - - bm Rxh7;
3rnrk1/p4p1p/3p2pQ/2pNq3/1nP5/1P4PP/P3PRB1/5RK1 w - - bm Qxf8+;
r3r1k1/pbq1bppp/1p2p3/2p1P1B1/4B1Q1/8/PPP2PPP/3RR1K1 w - - Bxh7+;
2rq1rk1/pp3ppp/3p4/6P1/b3pP2/1N4Q1/PPP3P1/2KR3R w - - bm Rxh7;
r3r1k1/p3bppp/q1b2n2/5Q2/1p1B4/1BNR4/PPP3PP/2K2R2 w - - bm Rg3;
2r4r/1q1kb1p1/4p2p/1n1pP3/1p6/3RBP2/NPP2Q1P/1K2R3 b - - bm Qa6;
r2r1bk1/3qp2p/3pp1p1/p2n2N1/2N3Q1/BP4P1/P4PP1/2R3K1 w - - bm Nb6;
2kr3r/pRp3q1/2Pppn2/4p2p/B2bP1p1/P6P/2P2PP1/2BQ1RK1 w - - bm Bh6;
r1r3kb/1pqbpp2/p2p1npB/n7/3NP3/1BN2P2/PPPQ2P1/2KR3R w - - bm Bf8;
8/7p/1p3pp1/p2K4/Pk3PPP/8/1P6/8 b - - bm Kb3 f5;
5b2/p4B2/5B2/1bN5/8/P3r3/4k1K1/8 w - - bm Bh5+;
8/p5pq/8/p2N3p/k2P3P/8/KP3PB1/8 w - - bm Be4;
1k6/8/8/1K6/5pp1/8/4Pp1p/R7 w - - bm Kb6;
5K2/kp3P2/2p5/2Pp4/3P4/r7/p7/6R1 w - - bm Ke7;
2r3k1/6pp/3pp1P1/1pP5/1P6/P4R2/5K2/8 w - - bm c6;
r2k4/8/8/1P4p1/8/p5P1/6P1/1R3K2 w - - bm b6;
5k2/1p6/1P1p4/1K1p2p1/PB1P2P1/3pR2p/1P2p1pr/8 w - - bm Ba5;
3R3B/8/1r4b1/8/4pP2/7k/8/7K w - - bm Bd4;
8/1p5p/6p1/1p4Pp/1PpR4/2P1K1kB/6Np/7b w - - bm Rd1;
8/8/k7/n7/p1R5/p7/4r1p1/KB3R2 w - - bm Rc3;

ronnie millsap

unread,
Apr 25, 2020, 9:44:46 AM4/25/20
to LCZero
Thank you for the update D! really appreciate your endgame work. i do find it fascinating.

Dietrich Kappe

unread,
Apr 25, 2020, 1:10:38 PM4/25/20
to LCZero
Jack,

thanks for these. I would concede that sf outperforms most nets on puzzles, but in practical terms, well, see the Little Ender vs SF example above.

Jack Lo

unread,
Apr 26, 2020, 10:18:30 AM4/26/20
to LCZero
Complete end game test - eigenmann:

Stockfish 10 - 97/100
Stockfish 11 - 97/100
Stockfish 8 - 94/100
Sting 14 - 94/100
Fire 7.1 - 85/100
Fire 5 - 74/100
LC0_32988+Syzygy - 68/100
Leelafish_32988+SF10 = 66/100
LC0_32988 - 64/100
Fire 4 - 46/100

There is a big difference between LC0 and SF. SF plays almost perfectly. Leelafish mechanisms do not help much.

Complete tactic test - AH_tactics:

Stockfish 11 - 246/250
Stockfish 10 - 246/250
Stockfish 8 - 246/250
Sting 14 - 245/250
Fire 7.1 - 239/250
Fire 5 - 186/250
LC0_32988 - 170/250
Fire 4 - 88/250

There is also a big difference. Hence my idea: to note out those positions that SF can handle and LC0 can't, and focus on them.

I chose these 21 positions from the two tests above. Test results for LC0_unsolved:

Stockfish 10 - 21/21
Stockfish 8 - 20/21
Goratschin_32988+SF10 (tuned) - 20/21
Goratschin_32988+SF10 (default) - 17/21
Leelafish_J13B.2-178+ASMFISH_2019-04-05 (tuned) - 12/21
Leelafish_32988+SF10 (default) - 9/21
Allie_0.5_32988 - 1/21
LC0_ender128_90l - 1/21
LC0_32988 - 0/21
LC0_32988+Syzygy - 0/20

Simple Goratschin-rules work pretty well.

Brian Richardson

unread,
Apr 26, 2020, 11:18:44 AM4/26/20
to LCZero
If the tests were run with net 32998, I think that net is more than a year old.

Considerable progress has been made since then, although the devs are currently actively working on improving endgame play.
In the mean time, should you feel like testing newer/bigger nets some suggestions here:
It would also be best to use one of the most recent Lc0 releases too.

Thank you for sharing your results.

Dietrich Kappe

unread,
Apr 26, 2020, 11:29:04 AM4/26/20
to LCZero
Can I make a suggestion? Use a more recent net, say the Sv-3010 net that just won the TCEC? Since the t30 run, there has been the stronger t40 run and the now monstrous t60 run.

glbchess64

unread,
Apr 26, 2020, 11:48:47 AM4/26/20
to LCZero
There is a lot of puzzle that LC0 can not solve and a lot of puzzle that SF can not solve. Then there is practical play in games : SF can win before endgame, but in almost all Leela games the endgame is reached. And we can see that Leela (with SV-3010) is able to hold inferior position and to win a lot of superior position. Still there is exceptions but the mean result are in Leela favour.

The TCEC cup 5 is also instructive. Leela with one RTX 2080Ti see the winning positions before those engines that play with 176 threads and often before crystal.

The fact is that she play very well the endgames that appear on board with no queens. With queens SF is clearly better (about 20 elo difference between endgame with queens and endgames without).

Jack Lo

unread,
Apr 30, 2020, 4:40:10 AM4/30/20
to LCZero

Thanks for the update suggestions, but it won't work. I do a lot of tests, mainly on the positional game, and I keep up to date with the networks. There has been no progress on these (end game/tactics) tests for months. Similarly, the new versions of the program do not make much progress. A greater difference can be observed when using an RTX card. If you have one, use 21 items by pasting them into the epd file. Of course, use the network you want.

Anyway, take a closer look:
(a) I've used jhorthos network (J13B.2-178 - the successor of "Terminator 2"), but progress has been negligible.
(b) The solution to the problem discussed in the topic (one of the solutions) is to combine two engines. This shows a bold result.

As for the LC0 practice game, I have other observations. Leela wins SF in the initial phase of the game, but even the advantage she gained can be lost at the end of the game (the result is a draw).
 

Jack Lo

unread,
Apr 30, 2020, 5:15:51 AM4/30/20
to LCZero
LC0 24.1 + 384x30-t60-3200 net

1. ,  AH_Tactics-250   > 60s.
2. ,  AH_Tactics-250   > 60s.
3. ,  AH_Tactics-250   > 60s.
4. ,  AH_Tactics-250   Rozwiązane w 3.45s/4; Rozwiązane: 1
5. ,  AH_Tactics-250   Rozwiązane w 25.66s/9; Rozwiązane: 2
6. ,  AH_Tactics-250   > 60s.
7. ,  AH_Tactics-250   Rozwiązane w 0.86s/2; Rozwiązane: 3
8. ,  AH_Tactics-250   Rozwiązane w 20.58s/8; Rozwiązane: 4
9. ,  AH_Tactics-250   > 60s.
10. ,  AH_Tactics-250   > 60s.
11. E_E_T 005 - B vs B,  eigenmann_endgame_test   > 60s.
12. E_E_T 032 - L&L&S vs T&L&L,  eigenmann_endgame_test   > 60s.
13. E_E_T 033 - L&S vs D,  eigenmann_endgame_test   Rozwiązane w 15.49s/9; Rozwiązane: 5
14. E_E_T 049 - T vs B,  eigenmann_endgame_test   > 60s.
15. E_E_T 060 - T vs T,  eigenmann_endgame_test   > 60s.
16. E_E_T 062 - T vs T,  eigenmann_endgame_test   Rozwiązane w 21.44s/9; Rozwiązane: 6
17. E_E_T 063 - T vs T,  eigenmann_endgame_test   > 60s.
18. E_E_T 074 - T&L vs T&B,  eigenmann_endgame_test   > 60s.
19. E_E_T 076 - T&L vs T&L,  eigenmann_endgame_test   > 60s.
20. E_E_T 081 - T&L&S vs L&B,  eigenmann_endgame_test   Rozwiązane w 6.22s/7; Rozwiązane: 7
21. E_E_T 097 - T&T&L vs T&S&B,  eigenmann_endgame_test   Rozwiązane w 16.71s/6; Rozwiązane: 8

Wynik: 8 z 21 = 38.0%. Średni czas = 13.80s / 6.75

OK. Looks promising, but it's not SF perfection. I'll check the full tests to make sure there was no setback in the other positions.

glbchess64

unread,
Apr 30, 2020, 5:20:03 AM4/30/20
to LCZero
J13 nets are based on T40 and totally outdated now. Main T60 (24 blocks size) nets are better (with @kiudee parameters, the default now) and SV big nets like SV-3010 (30 blocks size) even better on strong hardware/long TC.

I suggest you to follow this link
that gives books for testing. One of them is a list of books to test endgame play from 6 men to 32 men (all pieces). Nobody yet tested this at TC and published the results (only at fixed nodes). This will be interesting if you can verify what happen say at 1'+1". To test endgame it is better to do no adjudication and to not allow TB for Leela at least. It will be interesting to see the difference when SF has TB and when it has not. Pickup random line in book and play it with both sides (there is too many lines to play all at TC).

Depending of what hardware you have it may also be interesting to test with a good T59 net (10 blocks size).

The best 20 blocks net seem not to be a Leela net. It appear that it is Stein 14.3 (better than SV-T40-1705 the best Leela 20 blocks net).

glbchess64

unread,
Apr 30, 2020, 5:27:15 AM4/30/20
to LCZero
Testing with problem is really difficult. The selection of the problem can give strong biases. There is a great difference with solving some problems and being able to play real endgames. Often problems are based on deep calculation and AB engines are better. But practical play where there is not just a good move to find but some good plan to apply are better for Leela and worse for SF : SF find combinations but is bad at finding plans and there is also plan in endgame play.

Jack Lo

unread,
Apr 30, 2020, 6:52:55 AM4/30/20
to LCZero
Thank you for the link. I understand that the puzzles give a biased result. But it would be nice to have an engine that plays hard and solves these puzzles.

I'll explain what I mean by an example. Look at these games (TCEC Season 17 Superfinal):

18
30
42
58
100

Leela played white. She had a big advantage. The SF agreed with this evaluation. Nevertheless, the games ended in a draw. That should be +5 points for LC0.

Also in game 27 (played with blacks) in 24 moves the position was better for LC0.
A good chance of winning appeared in game 49.

Leela could win the match with an even bigger score difference.

One more observation. Compared to SF, Leela doesn't use much TB. That's because her search isn't as deep as SF. I don't know how much it matters in practice, but maybe that's one of the reasons for these draws.


Jack Lo

unread,
Apr 30, 2020, 1:34:21 PM4/30/20
to LCZero
New results based on LC0 24.1 and network from the SV repository.
The margine for Goratschin is 100 cp (1 pawn). I changed the default value of 50 cp.

Eigenmann test update:

Stockfish 10 - 97/100
Stockfish 11 - 97/100
Goratschin_LC0_384x30-t60-3200+SF11 - 95/100
Stockfish 8 - 94/100
Sting 14 - 94/100
Fire 7.1 - 85/100
LC0_384x30-t60-3200 - 74/100
Fire 5 - 74/100
LC0_32988+Syzygy - 68/100
Leelafish_32988+SF10 = 66/100
LC0_32988 - 64/100
Fire 4 - 46/100

AH_tactics test update:

Stockfish 11 - 246/250
Stockfish 10 - 246/250
Stockfish 8 - 246/250
Sting 14 - 245/250
Goratschin_LC0_384x30-t60-3200+SF11 - 244/250
Fire 7.1 - 239/250
Fire 5 - 186/250
LC0_24.1_384x30-t60-3200 - 185/250
LC0_32988 - 170/250
Fire 4 - 88/250

----------------------------------------

How to prepare own Goratschin exe.

1. Copy goratschinLauncher.py and goratschinchess.py to your hard drive.
2. Go to "https://www.python.org/downloads/windows/" and download "Windows x86-64 executable installer"
3. Install Python with default settings.
4. Run "cmd" (Windows command line) and type "pip install python-chess"
5. Edit goratschinLauncher.py and select which endines you want to use. Example:

engineFolder = "./engines/"
engineFileNames = ["lc0.exe", "stockfish.exe"]

6. Copy selected exe files (engines) to the "engines" subfolder.
7. Go to the directory containing the py files.
8. In command line type "pyinstaller -wF Launcher.py". The GoratschinLauncher.exe file will be created.
9. Extract GoratschinLauncher.exe from the "dist" directory.
10. Install the GoratschinLauncher.exe engine in the GUI.

Peter Feldtmann

unread,
May 1, 2020, 6:40:17 AM5/1/20
to LCZero
Jack, could you please check the log files of Goratschin? I guess most of the moves, if not all, come from SF11 by overruling lc0 - am I right? 

Jesse Jordache

unread,
May 1, 2020, 3:05:20 PM5/1/20
to LCZero
I guarantee you that Leela is better at generalizing the endgame, when it begins and how to play it, better than any human living or dead.

I think a lot of "Leela sucks at endgames" is osmosis from the extremely Stockfish fan - heavy TCEC chat.   Aside from open boards with queens on, when Leela's search suffers from an overabundance of reasonable moves, and the general difficulty in generalizing queen endgames, Leela is at least as strong as Stockfish. 

This isn't always obvious, because the endgames that Leela attempts have a much higher level of difficulty than the ones that Stockfish winds up in.  Stockfish cannot tell the difference between a won OCB endgame and a drawn one if it cant' see a route to queening.  Leela can.  Stockfish cannot tell the difference between a won or drawn rook endgame unless it can see a direct route to a mating or queening position.  Leela can.  etc.

One area where I think Leela is weak is the late middlegame - too often she'll turn a promising position into sort of blahh, through excessive swapping.  I don't know how you would address that.  The queen problem is also an issue that my just be part of Leela's search algorithm - the way Stockfish's judgement falls apart when there are a lot of pawns on the board.

On Wednesday, April 22, 2020 at 11:41:48 AM UTC-4, Charles Roberson wrote:
Every chess player above 1700 Elo knows that the type of thinking for the endgame is rather different than that of the opening and middlegame. I suggest having a second neural net just for endgames. It would be quite easy to decide when to switch from the primary network to the endgame network.

On Wed, Apr 22, 2020 at 2:06 AM M MUSTERMANN <1ches...@gmail.com> wrote:
Christopher Burton:
I like the idea of combining reinforcement learning with a king of temporal based learning.  I haven't yet implement anything yet, but, my thoughts go like this.

Score a position with a 1 node search and then score that position searching normally to compare the results.  If the results are very different, there is a learning opportunity for the network.  The positions most likely to return a very different score would tend to be more tactical, sacrifices, checks, capturing sequences, etc.

This would combine reinforcement learning and a form of supervised learning.  The supervisor in this case is the result from the deep search.  This approach I believe would improve the accuracy of the network in general without search, and search further reduces network errors.  Another possibility is that it wouldn't really improve strength, it could simply improve the rate of learning.  Either way, it is near the top of my own testing list.

-Chris

On Sunday, April 19, 2020 at 2:09:16 PM UTC-4, Dietrich Kappe wrote:
When you say “improve,” do you mean in terms of results or aesthetics? The results are quite good. The aesthetics are not, of course.

To solve the latter, you have some options:

1) use an ab Engine late in the game to drive a rapid conclusion. This is what Scorpio does when it gets down to 9 men.
2) use some small bonus eval to encourage pleasing moves, like material advantage and pawn pushes. I did a patch for the old lczero ending that resulted in semi-pleasing play.
3) the new mlh (moves left head) that is used to steer towards quicker results. This can have other benefits. T71 is using this, I think.

Again, endgame results are quite good already, the games are just ugly.

LC0 needs to be improved to play much better endgames.
1. To give pieces for nothing is a very bad style. Even if the position is a draw it doesn't make sense to give all chances to win away. It loses elo.
2. LC0 doesn't really know how to win some won endgames.
3. LC0 doesn't really know ho to draw some endgames.
4. Also Stockfish is much better in endgames.
=Tablebases must be used in training.
Or automatically check all trained games with 7 pieces tablebases online and change the results automatically according to the tablebases.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lcz...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/8fbaae29-fdd6-47ee-bbdc-845dae176a50%40googlegroups.com.

Jesse Jordache

unread,
May 1, 2020, 3:11:06 PM5/1/20
to LCZero
Speaking of the Leelafish idea, I always thought it would work better if you had Leela check SF's eval at around |2.5|.  If stockfish matches or exceeds Leela's evaluation, let it take over.

A common feeling during the SuFi was "Stockfish could have won with this position that Leela just drew.  Of course Stockfish never would have reached this position in the first place."  Let Leela find the winning positions and let Stockfish, well, win them.

Warren D Smith

unread,
May 1, 2020, 3:18:24 PM5/1/20
to Jesse Jordache, LCZero
--This idea by JJ makes sense to me based on the theory that Lc0, and
MCTS players generally, play poorly in either high-probability losing
or high-prob winning positions. They play best in near-even
positions. The threshold JJ called 2.5 could of course be tuned to
maximize performance.

--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)

Jack Lo

unread,
May 2, 2020, 4:06:06 AM5/2/20
to LCZero
Peter Feldtmann:
Jack, could you please check the log files of Goratschin? I guess most of the moves, if not all, come from SF11 by overruling lc0 - am I right? 

Hello Peter,
I didn't create a log file, but I saw it's like you're writing. In these positions the SF evaluation shows a big advantage and above all it is faster.

When the positions are even, Goratschin plays like LC0. When SF finds the possibility to make a chess combination, then this move is made. It can be said that the attack is strengthened. This is important in tournament play against weaker chess engines (ELO improves). But in a match against the SF it may not matter much. SF will not surprise with a Stockfish combination. This way of combining the engines that you propose does not strengthen the defense. I.e. it will not allow LC0 to avoid weaker moves (LC0 evaluation will be higher than SF, which sees weakness of the best LC0 move). A weak LC0 move will be made anyway.

The solution was to switch to SF in endgames, i.e. to make SF boss. The boss should be the chess engine that is statistically stronger in a given position.
Reply all
Reply to author
Forward
0 new messages