Missing Mates

3,285 views
Skip to first unread message

Joseph Ellis

unread,
Mar 23, 2018, 11:03:00 AM3/23/18
to LCZero
While the progress of LCZ has been quite remarkable, it is still pretty terrible at recognizing mate & mate threats.  It often mis-evals simple mate in 2s and 3s.

This strikes me as odd given that it has trained over hundreds of thousands of games 85%+ of which ended in checkmate.

Although I am not sure how this could matter, I was wondering if the code here:

if (drawn || !MoveList<LEGAL>(cur).size()) {
            float score = (drawn || !cur.checkers()) ? 0.0 : (color == Color::WHITE ? -1.0 : 1.0);
            result = SearchResult::from_score(score);
            }
else if (m_nodes < MAX_TREE_SIZE) {
            float eval;
            auto success = node->create_children(m_nodes, bh, eval);
            if (success) {
                result = SearchResult::from_eval(eval);
            }


in the UCTsearch might be interfering/preventing the eval from recognizing mate (since in actual mate instances, the eval is not used).


A separate idea I had is that it might be useful for the network to have pos.checkers() as an input bit.


Perhaps nothing is amiss here, but the results thus far strike me as incredibly odd....

jkiliani

unread,
Mar 23, 2018, 4:09:40 PM3/23/18
to LCZero
I'm not exactly sure what this code does, but it has been experimentally confirmed that LCZ is capable of finding mate-in-one from Dirichlet noise even for instances when the network doesn't know about them, read about it here: https://github.com/glinscott/leela-chess/issues/105

This means that in time, the network is going to learn to recognise positions that lead to mates, and the search will then find mates much more reliably. In the cases where this doesn't happen yet, it simply means that the neural network doesn't recognise the required moves leading the mate as good candidate moves.

That being said, it's likely that tactics will always remain the Achilles heel of LCZ, and its strength will stem mainly from superior positional evaluation compared to classic chess engines.

Joseph Ellis

unread,
Mar 23, 2018, 9:22:37 PM3/23/18
to LCZero
Well, the concerning part is this is happening in games without any noise enabled.

It is almost as if it treats the king as just another piece and does not respect check at all.....

If it was just a one-off or something a bit more subtle, I wouldn't bring it up... but the mistakes are pretty glaring in comparison to its nominal quality of play.

And while I can understand higher-depth tactical weakness, these missed mates in 3, 4, 5, 6 and tactics against the king are pretty egregious.

 

jkiliani

unread,
Mar 23, 2018, 9:29:37 PM3/23/18
to LCZero
I think you misunderstand me: The point of noise in training games is to fix knowledge holes like this. The network is still very new and in many cases doesn't know which positions have mate patterns. Tactics at depth 3 are already too deep when the network is missing a candidate move.

All I can say here is give it time, and wait for stronger nets. It has been tried and tested that the improvement algorithm works, but this is so far nowhere near the strength the net will have when it stalls.

Joseph Ellis

unread,
Mar 23, 2018, 9:39:00 PM3/23/18
to LCZero
No I get it.. it is just these combinations are not at all difficult compared to what it is calculating elsewhere.  They are not flashy or unusual moves, not sacrifices, and they are quite shallow compared to what it is searching.....


jkiliani

unread,
Mar 23, 2018, 10:00:10 PM3/23/18
to LCZero
For neural nets, it's often the simple things that are more difficult than the complex ones. Leela Zero used to put its own groups into atari (i.e. allow them to be captured) all the time in the past. Now that's extremely rare. A strong Leela Chess will probably still not be nearly as tactically strong as Stockfish, but will try to win by positional play.

Joseph Ellis

unread,
Mar 23, 2018, 10:11:10 PM3/23/18
to LCZero
I'm not talking about all tactics though, just exploits against the king.....

Consider this.... LCZ has played over 700,000 games 85%+ of which have ended in checkmate.  It is the single most prevalent (not to mention important) feature in all of its training, and arguably the worst it is at identifying....

Now, I may only have a small human brain, but it tells me something is not quite there...

Lucien Grondin

unread,
Mar 24, 2018, 10:10:48 AM3/24/18
to LCZero
I kind of agree there is something not quite right here.  Here are my two cents.

From what I understand, during self play LC0 finds checkmates basically by pure luck.  It makes a move and is being told the game is over.  It then back-propagates (I assume) that result to the weight in all positions that the game went through to reach that checkmate.

This means that for all encountered checkmates, the number of positions whose evaluation will be influenced by that checkmate is only the number of moves in that game.

I believe a better policy is possible.

From a given checkmate found during self-play, what about making all possible forced-sequences backwards, or at least lots of them, in a way similar to how tablebases are made?  And then use this knowledge to train the network on those positions that were not played during self-play but are kind of similar and lead to the same checkmate position?

We might also consider doing some supervised training on a library of checkmate exercises, but I guess that would defeat the unsupervised training aspect of the project.

Joseph Ellis

unread,
Mar 24, 2018, 1:34:33 PM3/24/18
to LCZero
I had the idea to test between master and the code here: https://github.com/jhellis3/leela-chess/commit/214414ff7e3b8c39dcf72dfb3e4b8f69c3c33b7f  at a fixed number of playouts so the slower code isn't disadvantaged.

Compiling was easy enough... then I ran into an issue with cutechess-cli which I eventually tracked down, rebuilt the latest version of cutechess-cli, but now the match starts (AFAICS) but nothing happens. Looking at the output pgn, the best I was able to accomplish was this:

[Event "?"]
[Site "?"]
[Date "?"]
[Round "33"]
[White "?"]
[Black "?"]
[Result "?"]

[Event "?"]
[Site "?"]
[Date "2018.03.24"]
[Round "34"]
[White "Test"]
[Black "Master"]
[Result "1-0"]
[GameEndTime "2018-03-24T12:04:19.771 CDT"]
[PlyCount "0"]
[Termination "abandoned"]
[TimeControl "inf"]

{Black disconnects} 1-0 

And I am currently at a loss at how to proceed...

If anyone else would like to run a match with the above code or send me a cutechess-cli script which works for them, I would appreciate it.  

jkiliani

unread,
Mar 24, 2018, 1:47:31 PM3/24/18
to LCZero
#!/bin/bash

WDR= 

./cutechess-cli -rounds 100 -tournament gauntlet -concurrency 2 -pgnout SF5.pgn \
 -engine name=lc_id29 cmd=lczero arg="--threads=1" arg="--weights=$WDR/9fa03e" arg="--playouts=800" arg="--noponder" arg="--noise" tc=inf \
 -engine name=sf_lv5 cmd=stockfish_x86-64 option.Threads=1 option."Skill Level"=5 tc=40/1 \
 -each proto=uci

Just insert your working directory for the weights there, and export the paths for lczero and Stockfish, and you can test the two against each other. I'm using this script for my tests in https://github.com/glinscott/leela-chess/issues/109

zz4032

unread,
Mar 24, 2018, 2:03:20 PM3/24/18
to LCZero
The path to the weights file must be a complete absolute path.

Joseph Ellis

unread,
Mar 24, 2018, 2:03:44 PM3/24/18
to LCZero
Yeah, that is prettymuch identical to what I tried... no luck.  It starts the game... neither version plays any moves... then it moves to the next game with no result until it reaches the game limit.  I give up for a while.... gotta de-tilt.

jkiliani

unread,
Mar 24, 2018, 2:09:59 PM3/24/18
to LCZero
I had the same problem when I first got my cutechess-cli working. Is the export PATH for lczero working? I.e., can you call the engine from within any directory?

Joseph Ellis

unread,
Mar 24, 2018, 2:44:29 PM3/24/18
to LCZero
tried full path... no joy.

Ran ./lczeromaster -w latest --full-tuner

And eventually that resulted in: 

Started OpenCL SGEMM tuner.
RNG seed: 0xbc9eb1950482e4ac (thread: 14531754873801667828)
Will try 5108 valid configurations.
(1/5108) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0694 ms (30.2 GFLOPS)
(7/5108) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0561 ms (37.4 GFLOPS)
(25/5108) KWG=16 KWI=2 MDIMA=32 MDIMC=8 MWG=32 NDIMB=16 NDIMC=16 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0513 ms (40.9 GFLOPS)
(39/5108) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=1 VWN=1 0.0443 ms (47.3 GFLOPS)
(98/5108) KWG=16 KWI=2 MDIMA=16 MDIMC=32 MWG=64 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0414 ms (50.7 GFLOPS)
(119/5108) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=2 VWN=1 0.0354 ms (59.2 GFLOPS)
(146/5108) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0332 ms (63.1 GFLOPS)
(151/5108) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=16 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0331 ms (63.3 GFLOPS)
(158/5108) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=0 SB=0 STRM=0 STRN=0 VWM=4 VWN=1 0.0284 ms (73.9 GFLOPS)
(1480/5108) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=0 STRM=0 STRN=0 VWM=2 VWN=2 0.0259 ms (81.0 GFLOPS)
(1503/5108) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=0 VWM=4 VWN=2 0.0252 ms (83.4 GFLOPS)
(1840/5108) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=16 NWG=32 SA=1 SB=0 STRM=1 STRN=0 VWM=4 VWN=2 0.0247 ms (85.0 GFLOPS)
(2089/5108) KWG=32 KWI=8 MDIMA=32 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=0 STRM=0 STRN=1 VWM=1 VWN=2 0.0214 ms (98.0 GFLOPS)
Error building kernels: 

terminate called after throwing an instance of 'std::runtime_error'
  what():  Error building OpenCL kernels.

Perhaps it is an OpenCL issue on my machine.... IDK.

jkiliani

unread,
Mar 24, 2018, 2:49:31 PM3/24/18
to LCZero
What's your system configuration?

If you do have OpenCL errors, you're in good company, I have them too. I'm using the CPU compile of lczero, since mother else would work for me.

Joseph Ellis

unread,
Mar 24, 2018, 2:52:36 PM3/24/18
to LCZero
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 9.0.282
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce GTX 660
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 384.111
Device speed:  1058 MHz
Device cores:  5 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce GTX 660
with OpenCL 1.2 capability.

jkiliani

unread,
Mar 24, 2018, 3:00:04 PM3/24/18
to LCZero
Weird that you're getting OpenCL errors on a Nvidia Gpu. Maybe it's a driver issue, but I'm afraid I'm out of my depth there. You could still try the Github issues for people with similar problems, but for LCZero and LeelaZero, since the basic structure of both programs is very similar. Hope you get it working, good luck!

checkersp...@gmail.com

unread,
Mar 24, 2018, 3:09:58 PM3/24/18
to LCZero
Will there ever be a cpu-only version that does not require opencl ?

jkiliani

unread,
Mar 24, 2018, 3:25:33 PM3/24/18
to LCZero
There already is. If you can compile the binary yourself, comment out #define USE_OPENCL in config.h before running make. If not, try to get the CPU only binary from the AppVeyor link, it offers binaries for both OpenCL and OpenBLAS (CPU).

Joseph Ellis

unread,
Mar 24, 2018, 3:39:52 PM3/24/18
to LCZero
The OpenCL issue was a red herring... compiled without it and still get the same result.

Aron Gohr

unread,
Mar 24, 2018, 5:12:45 PM3/24/18
to LCZero
Is missing mate in one really unexpected for this type of program? The rules of chess do not say that mate in one is worse than mate in ten as long as you do achieve checkmate, so I would expect that independently of reached strength a "no human knowledge" program may not prefer one over the other.

I would not be surprised for this reason if even AlphaZero would often fail to play for mate in one in totally winning positions or to defend against mate in one in totally losing ones.

Joseph Ellis

unread,
Mar 24, 2018, 6:06:39 PM3/24/18
to LCZero
Missing shorter mates in place of longer ones is not the issue.... it is just missing the fact that it is getting mated period.

Joseph Ellis

unread,
Mar 25, 2018, 12:04:28 PM3/25/18
to LCZero
While I was never able to get LCZ running under cutechess, I have managed to get things going on windows using Arena.  

Playing some test games between the two versions now (both using id33), and the results are quite interesting....  

gogpue...@gmail.com

unread,
Mar 25, 2018, 1:10:33 PM3/25/18
to LCZero
Can you explain please what is interesting about results ...

Joseph Ellis

unread,
Mar 25, 2018, 1:30:49 PM3/25/18
to LCZero
Ok, yeah.... the network has no idea what mate is...  

Here are a few example games... notice the values just before it gets mated.


Currently running a 100 game match, will post the complete pgn when done.

Joseph Ellis

unread,
Mar 25, 2018, 8:26:01 PM3/25/18
to LCZero
Ok, the tournament finish with a result of 64.5 - 35.5 in favor of master, scoreline of 60 - 31 - 9.

Although I did watch all of the games or review the entire pgn, a few highlights include in addition to those previously provided games 9, 13, 20, 25, 26, 93, 95, 96, 97.


I think the results speak for themselves.

Quite frankly, it is amazing to me that LCZ is as strong as it is without any concept of check or checkmate.

Fortunately, addressing the issue shouldn't be too difficult.  We could try not cutting off the search, let it learn for a couple of generations that way and see if it helps.

In addition to that, we could consider passing a pos.checkers() bit as an input feature along with a bit (or possibly replace movecount plane with movelist size) for Movelist<legal> == 0.  Using those 2 bits, it should be able to figure check, checkmate, and stalemate out quite trivially.

However, as these are references to position states and not game states some might not consider that "zero" enough.  In that case we just determine checkmate & stalemate as we currently do, but pass those as two distinct bits as input features.  As those are game states which are part of the rules of the game, there should be no objection.  Personally, I favor the former option as it is slightly simpler, and information passed may also be beneficial for general tactical awareness.  Or perhaps someone else has another idea?

Joseph Ellis

unread,
Mar 25, 2018, 8:41:54 PM3/25/18
to LCZero
Here is a link with both exes, 2 copys of id33, and the dll, should anyone want to run some tests for themselves.

https://drive.google.com/open?id=1zpeUUt5kBtT1y3xcQlDgJghibFrFSf_R

Michel VAN DEN BERGH

unread,
Mar 26, 2018, 2:12:28 AM3/26/18
to LCZero
It seems to me that before taking any further steps on this issue, it should be understood first why the UCT search has trouble recognizing checkmates in the current version of lczero.

The fact that the network does not know about checkmates should not prevent UCT from correctly handling them. After all UCT is just a search algorithm...

Joseph Ellis

unread,
Mar 26, 2018, 2:31:46 AM3/26/18
to LCZero
There is no problem with the search.  The problem is that the search cutoff masks the defective NN values, and it does not learn.  It has played hundreds of thousands of games ending in checkmate, and id33 evaluated mated in 1 as +7 for itself without the cutoff.  The issue isn't seeing the mate at the end with the cutoff enabled, it is that with no recognition by the NN of what constitutes checkmate (and that checkmate wins the game), it obviously struggles constructing features which we might call tactical awareness or king safety.   Sure, you can always ultimately search your way to the solution if you have enough depth... which is how TBs are constructed.  But that is not always practical, so we use eval to guide search.  But if the eval can not recognized a mate in 1 as problematic... well that is problematic.  The eval can recognize good and bad pawn structures, material imbalances, complex material exchanges, just not checkmate...

    

Michel VAN DEN BERGH

unread,
Mar 26, 2018, 5:02:23 AM3/26/18
to LCZero
Still I do not really understand the issue.

If the +7 evaluation is the effect of defective king safety then training should correct this. After all training takes into account the actual game outcome.

If the +7 evaluation is "objective" (e.g. material advantage and ok king safety) and the mate is a fluke
then it is the task of the search to detect this mate. During training the policy network should learn
the likely mating moves which in turn should help the search to detect the mate more easily.

If there is really a problem then it seems more an issue with training. I.e. that the signal generated by mates/mating moves is too weak as the game ends right after them. If that is true then the problem will correct itself eventually.

Gary Linscott

unread,
Mar 26, 2018, 12:54:45 PM3/26/18
to LCZero
It's a very interesting point.  We don't actually feed the network direct checkmate positions, because we don't do a search at terminal nodes.  The training procedure should cause it to predict the move immediately prior to the mate as a checkmate though.  It might just be it hasn't been able to learn this yet.

I'm not against adding an "in check" bit to the network, but I'm curious if it would even get used.  Definitely an interesting experiment though :).  But given AlphaZero didn't have this, and certainly seemed to learn king safety, I'd be okay seeing if the network learns this by itself.  It's just starting to learn attacking chess it seems, see the second game on this page: http://talkchess.com/forum/viewtopic.php?t=66824&start=270.

Dan

unread,
Mar 26, 2018, 1:45:29 PM3/26/18
to LCZero
One thing that adds to this problem of finding shortest mate is the averaging done in standard UCT. With minmaxing backup, however, it will have no problem choosing the child leading to the shortest mate.
I suppose you already have code to find shorest mate by returning -mate + ply at terminal positions.
In any case, you should really really try minimaxing backups and get an instant 700 elo increment... In my engine it has even become stronger than alpha-beta rollouts.
It is still far off from the standard alpha-beta recursive searcher though.

Joseph Ellis

unread,
Mar 26, 2018, 1:46:52 PM3/26/18
to LCZero
From what I have seen thus far, the network is not likely to learn what checkmate is.

I am currently compiling a test list of mate in 1s from the pgn posted previously.  I have only checked a couple thus far, but yes the network does seem to be making some progress in some of these conditions.  However, it has not, as of id38, learned to identify what checkmate is or possibly even check.  Perhaps the most efficient and definitely most effective method for the network to accomplish this would be for it to use a number of filters through a couple of layers to create its own checkers() condition and legal move list.  It has not done this though. Instead, it is AFAICS, learning patterns & positional features preceding checkmate.  I am sure with a large enough net and enough training, it can eventually learn to be right most of the time for the most common positions.  And it may also learn to avoid some tactical positions as they are more likely to represent uncertainty in its eval.  But unless it does reconstruct checkers() & legal moves, the eval will likely never recognize mate with 100% accuracy or value it correctly.  If we feed it those 2 bits as inputs it can determine checkmate & stalemate trivially and with 100% accuracy. And in that case it should learn what checkmate is practically overnight, as opposed to learning positions likely to precede checkmate fairly accurately most of the time over months & years...   

I do not doubt that the current form can eventually learn to be good enough for all practical purposes, but I don't consider it to be a particularly logical or efficient solution.  And for 2 networks of a given size, one which understands checkmate and one which has no concept of it, I know which I would expect to be stronger.

Huragan

unread,
Mar 26, 2018, 4:15:51 PM3/26/18
to LCZero
I believe that during the time the net will be able to recognize winning positions more reliable. It is still quite small, young and unpractised :-)

Thanar

unread,
Mar 26, 2018, 6:40:07 PM3/26/18
to LCZero
I think it makes sense to do a search at terminal nodes in order to feed the network direct checkmate positions, since it will likely make it easier for the network to learn.

There doesn't appear to be any downside to this. It isn't any less "Zero", since the network is still learning only from self-play, and is not given any special information besides the rules of the game.

Joseph Ellis

unread,
Mar 26, 2018, 8:03:32 PM3/26/18
to LCZero
Just realized my tests were deeply flawed....  Re-running a fixed version....  

Joseph Ellis

unread,
Mar 27, 2018, 12:10:55 PM3/27/18
to LCZero
Yeah, I am an idiot....  LCZ was blind to mates, but only because I stupidly forced it to be.  Had I bothered to actually read the block of code I was editing or look at the recent commits to the relevant files, a great deal of futile effort could have been avoided.

Nevertheless, I did fix the problem, re-ran the test, and the results may be of some academic interest.  The fixed version went 37-56-7 vs master for -67.00 Elo with ID33.  As of that point, the eval of mates still lags behind ideal quite a bit.  I also compiled a small list of a few mate in 1s to track progress over various nets, which can be viewed here:

Progress is clearly visible over the generations, though it still misses a few even with ID45...  I will continue the match testing every 10 IDs or so and hopefully we can see the Elo difference converge to 0 over time.

I do still think passing checkers() and the legal movelist size is inputs could be potentially be valuable in raising LCZ's tactical awareness, but it definitely not a critical issue.

Andy Olsen

unread,
Mar 27, 2018, 1:21:05 PM3/27/18
to LCZero
Are you pasting just the FEN for the mate in 1? That might confuse the net because the net expects the last 8 moves of history:
https://github.com/glinscott/leela-chess/issues/106#issuecomment-372533537

For best results we need to use full PGN, or at least FEN + last 8 moves. I don't know how serious the problem is, the net may be able to handle it even without the history.

This is not what every other UCI chess engine does, so we have a bad user experience problem here. I have some plans to improve this but haven't had time to get to it yet.


2018년 3월 27일 화요일 오전 11시 10분 55초 UTC-5, Joseph Ellis 님의 말:

Joseph Ellis

unread,
Mar 27, 2018, 1:35:53 PM3/27/18
to LCZero
Yep, just fen, I don't think it is reasonable to expect users to manually input 8 moves (of which they may be unaware) to evaluate one position or to track down a pgn for every position.  Some positions are also crafted puzzles and have no preceding moves.

Andy Olsen

unread,
Mar 27, 2018, 2:20:43 PM3/27/18
to LCZero
Yeah I agree. If you have time could you try testing FEN vs PGN with at least 8 moves? Maybe you can find some positions that include PGN leading up to the mate. Then we can find out how serious of a problem it is. I really don't have any idea how much worse the NN will do, maybe it will still be ok.


2018년 3월 27일 화요일 오후 12시 35분 53초 UTC-5, Joseph Ellis 님의 말:

Joseph Ellis

unread,
Mar 27, 2018, 2:33:54 PM3/27/18
to LCZero
I have the pgn for all of those positions, so I can try it... but to be honest I don't expect it to make much difference.   The progress is pretty clear over the generations at any rate, so any interference is likely relatively minor.

One question though... if this is an issue, how are the first 8 moves of every game handled?


Andy Olsen

unread,
Mar 27, 2018, 2:41:29 PM3/27/18
to LCZero
For the beginning of the game I think the extra history planes are left blank. It might have been better to make copies of the opening position, but by now the network has probably learned "7 blank histories = new game". So if you give 7 blank histories and then suddenly some random middle game position the network might be confused if it is the opening or middle game. 



2018년 3월 27일 화요일 오후 1시 33분 54초 UTC-5, Joseph Ellis 님의 말:

Joseph Ellis

unread,
Mar 27, 2018, 7:38:17 PM3/27/18
to LCZero
Ok, I ran the tournament again with ID45, and there was progress.  Eval actually beat Master, which is interesting.  The margin was pretty slim though, so probably just noise.  At any rate future nets should tell us more.

I also ran Master with ID45 on the mate in 1 positions as a reference (no history), and it performed as expected.

Re-ran ID45 with history from the pgn, no difference in mate detection. Two different move choices where it didn't find mate, but nothing of significant note.  No big difference in the evals, but if anything I would say the evals without history were on average slightly better.

Joseph Ellis

unread,
Apr 7, 2018, 12:34:30 PM4/7/18
to LCZero
Updated with nets 80 and 101 & added a chart to visually track progress.  Mate detection via eval continues to improve over time. 

https://docs.google.com/spreadsheets/d/1nPAhbqqPd_Be7sQJVBlkhjpjr-TMsrX5WY09iWVMS58/edit?usp=sharing

Balthazar Beutelwolf

unread,
Apr 9, 2018, 4:44:33 PM4/9/18
to LCZero
Leela's mating weakness is really odd. I let ID103 play some weaker engines, and it now regularly beats the likes of Fairymax or Shamax. But the style is very odd, see here. (Relatively long time control of 30 moves in 30 minutes.) Up to move 50 a very strong game by Leela, as you might have expected from a human 2200 player; she missed some of the killer blows Stockfish suggested, but still very respectable. Then all of a sudden her playing strength seem to drop 1000 ELO points and it takes her ages to win this, missing mate-in-3 in almost every single position (a couple were mate-in-4 only), and even several mate-in-2.

Edward Panek

unread,
Apr 9, 2018, 8:56:49 PM4/9/18
to LCZero

Joules Kin

unread,
Apr 9, 2018, 11:23:46 PM4/9/18
to LCZero
It's because leela doesn't care how she wins. If she was presented with, say, two options - one being 90% chance winning fast and 99% chance winning slow - then leela will choose the 99% just to raise the percentage chance of winning. I've seen games where leela promotes two extra pawns to queens even though her opponent had a naked king sitting on the other side of the board. A win is a win, and all wins are equal in leela's eyes.

Joseph Ellis

unread,
Apr 9, 2018, 11:36:41 PM4/9/18
to LCZero
That is rather missing the forest for the trees... mate on the board is 100% and the score doesn't go higher than that....

The issue is that these relatively trivial mating patterns are not being properly expanded.

jkiliani

unread,
Apr 10, 2018, 12:57:27 AM4/10/18
to LCZero
This endgame weakness is completely subjective unless Leela actually gives away a certain won position. How long exactly it takes her to find a checkmate is pretty much irrelevant, since a win is a win, regardless of how long it takes.

Joseph Ellis

unread,
Apr 10, 2018, 1:09:33 AM4/10/18
to LCZero
That is not the point either (has nothing to do with stage of the game)... Please do not use strawman fallacy.

Michel VAN DEN BERGH

unread,
Apr 10, 2018, 4:02:29 AM4/10/18
to LCZero


On Tuesday, April 10, 2018 at 6:57:27 AM UTC+2, jkiliani wrote:
This endgame weakness is completely subjective unless Leela actually gives away a certain won position. How long exactly it takes her to find a checkmate is pretty much irrelevant, since a win is a win, regardless of how long it takes.

I really don't like such logically conflated statements. The truth is this:

It is an objective fact that currently lczero often takes many more moves than necessary to convert some wins.

It is indeed subjective whether you consider this important or not. But your opinion on this is as good as anyone else's.

 

jkiliani

unread,
Apr 10, 2018, 5:35:40 AM4/10/18
to LCZero
I can accept that, and the points you raise are valid. Unfortunately, taking its sweet time to convert won games is likely one characteristic that Leela will have even when it's extremely strong, judging by the playing style of MCTS-NN Go engines (like Leela Zero or others). It's an unavoidable side effect of training purely towards maximising the score (win-draw-loss), which if changed would likely weaken the engine's strength. Elegance in play, or finding checkmates faster just doesn't feature into the training schedule.

Joules Kin

unread,
Apr 10, 2018, 5:41:02 AM4/10/18
to LCZero
jkiliani (and I) was obviously responding to Balthazar's comment, specifically:

 "Up to move 50 a very strong game by Leela, as you might have expected from a human 2200 player; she missed some of the killer blows Stockfish suggested, but still very respectable. Then all of a sudden her playing strength seem to drop 1000 ELO points and it takes her ages to win this, missing mate-in-3 in almost every single position (a couple were mate-in-4 only), and even several mate-in-2.".

So this has absolutely everything to do with the endgame. It's not a straw man fallacy; it's explaining something to another person (ie. someone that isn't you).

Joules Kin

unread,
Apr 10, 2018, 5:46:08 AM4/10/18
to LCZero
This endgame inefficiency was exactly the same with AlphaGo. Look up any full commentary of games played by any version of alphago and you'll find that the uniting theme of all of them is the poor endgame. Alphago would turn 7 point leads (an impressive advantage) into 1 point and 0.5 point wins (a borderline win).

I think the reason why we didn't see much of it in Alphazero is that stockfish "resigned" in most of the lost games (probably a feature introduced by the Deepmind team to save time in generating the 100 game match). My idea is that if stockfish had its resign button off, then the alphazero games we have would be much longer and weirder than it is.

jkiliani

unread,
Apr 10, 2018, 5:47:23 AM4/10/18
to LCZero
For what it's worth, a potential future solution that would solve or at least considerably mitigate this issue is tablebase support for Leela. If a TB hit in the UCT search is given 1.0, 0.5 or 0.0 evaluations directly instead of using the neural net, Leela would find the quickest checkmates according to the tablebase in the future. This should not be used for training, but seems fine for playing other engines, just like adding an opening book shouldn't be a problem (as long as the book as carefully evaluated to not lead to inferior positions).


Am Dienstag, 10. April 2018 10:02:29 UTC+2 schrieb Michel VAN DEN BERGH:

Joules Kin

unread,
Apr 10, 2018, 6:08:57 AM4/10/18
to LCZero
I don't think that's even necessary. I'm still of the mind that a win is a win and all wins are equal. And if leela has to play a grandmaster or something, her opponent will resign before the weird endgame appears anyway, so it shouldn't matter a lot.

jkiliani

unread,
Apr 10, 2018, 6:13:35 AM4/10/18
to LCZero
I agree, but as Michael van den Bergh pointed out, not everyone does. It would be nice to have optional tablebase support, so that people who get annoyed by the slow checkmates can simply load a tablebase to solve that problem for them. Other who don't care about that won't have to bother with TB.

fenchel,

unread,
Apr 10, 2018, 6:28:41 AM4/10/18
to LCZero
it may also be interesting to find tweaks to the objective functions and training methods which penalize game length.
a very interesting regularization, imo.

just leaving this comment for sake of discussion, not trying to suggest anything be changed here.

Jörg Oster

unread,
Apr 10, 2018, 7:05:02 AM4/10/18
to LCZero
This will only have any effect with DTM bases, of course.

However, I don't understand why ANY search should not be able to resolve a mate in 4 or even a mate in 2 ...
I don't think that's the job of the evaluation or NN in this case.

jkiliani

unread,
Apr 10, 2018, 7:25:07 AM4/10/18
to LCZero
That seems to be an implementation detail... do any regular chess engines use a smaller DTZ base (4-piece?) along with a larger WDL base?

As for why any search should not be able to resolve mate in 4, well there's simply no reason for Leela to learn how to checkmate quickly, since it is not rewarded for it by the learning algorithm (as long as it doesn't blunder the win, that is). I guess that it will at least become better at it with a larger net, but still not as efficient as an Alpha-Beta engine.

Юрий Павлович

unread,
Apr 10, 2018, 8:50:26 AM4/10/18
to LCZero
I think the problem is in value estimation. I noticed when LZ plays with people on OGS and it estimates it's chances of victory as 100% it starts to give away points. So my theory is that because in winning situation all non-rubbish moves are clamped at 100% they all have same score as there is no better chance to win than 100%. This prevents finding a move that will further increase or secure advantage. When opponent catches up then most of the moves lose their perfect 100% score and engine again picks strongest response. This leads to weird homeostasis in Leela's behavior that people sometimes calls "trollish". Possible remedy to that is to use logits instead of percentages in value estimation.

вторник, 10 апреля 2018 г., 15:28:41 UTC+5 пользователь fenchel, написал:

jkiliani

unread,
Apr 10, 2018, 8:58:38 AM4/10/18
to LCZero
For the case of Leela Zero, I believe changing to temperature = 1 would make the engine try to maintain a more comfortable advantage, since it's less sure of winning at a given point difference.

Юрий Павлович

unread,
Apr 10, 2018, 9:58:05 AM4/10/18
to LCZero
I think there's a confusion: temperature is a parameter of softmax function which used in policy head when I'm talking about value head which is most likely uses sigmoid function. Logistic sigmoid outputs values in range [0..1] which is interpreted as probability and when input value(logit) becomes too big then due to limited variable precision sigmoid clamps at 1 and Monte-Carlo search chokes with indifferent values. If we remove sigmoid activation on evaluation stage and use raw logits this will probably resolve the problem. It would be still important to leave sigmoid activation on backpropagation stage otherwise variance of logits will explode to infinity and nullify all improvements.

вторник, 10 апреля 2018 г., 17:58:38 UTC+5 пользователь jkiliani написал:

jkiliani

unread,
Apr 10, 2018, 10:02:59 AM4/10/18
to LCZero
While your explanation is correct, the confusion was that there are actually TWO temperatures used in the AlphaZero approach. You just described the softmax temperature, which as you said transfers the policy head output into move probabilities. I was talking about the root move temperature, which decides how to select a move after UCT search based on a given visit count distribution.

Юрий Павлович

unread,
Apr 10, 2018, 10:29:18 AM4/10/18
to LCZero
Ah, sorry. I totally missed those changes and now I need to think about it. The first idea that comes to mind is to make temperature somehow trainable. I believe it was suggested somewhere on Github but now it's too late to make changes in network's output format.

вторник, 10 апреля 2018 г., 19:02:59 UTC+5 пользователь jkiliani написал:

Balthazar Beutelwolf

unread,
Apr 10, 2018, 12:32:07 PM4/10/18
to LCZero
If the issue were only delayed wins then I can see the point that this is merely an annoyance from a human's POV. But Leela actually does miss wins. Today I let it (ID117) play a couple of games against lichess's Stockfish at level 6 [level 7 is still too much]. Leela got into a winning position, was up 3 pieces in the endgame, and then drew by repetition of moves. In an utterly winning position.  This particular thing was (probably) an endgame weakness [unless it fails to take 3-fold reps into account in which case it's a proper bug], but the missing mates-in-2 points AFAIAC to a bug in the evaluation function: a forced mate is a 100% win, not a 97.3% win, and so should be picked over anything that is even remotely unclear (such as not being a forced mate); which is the reason why traditional chess engines distinguish between M-in-x evaluations and numerical ones. I noticed with on-line Leela that even in the mating position itself the evaluation is not 100 (or 0) but some number close to it, which is odd.  Preferring faster solutions is sometimes in programming a simple trick to avoid non-termination, and in chess non-termination is a draw. I would hypothesize that preferring shorter wins over longer ones would increase the speed at which Leela improves, because it takes more info out of a game than one-and-a-half bits..

Peter Schmidt-Nielsen

unread,
Apr 10, 2018, 1:17:55 PM4/10/18
to Юрий Павлович, LCZero
You suggest simply not scaling outputs to win probabilities at evaluation time, and keeping them as logits to avoid saturation. I have two concerns:

1) A strategy that maximizes expected logits out of the value network doesn't necessarily maximize expected win probability, as per Jensen's inequality, and expected win probability really should be what we care about.
2) How do you propose that mated positions be scored during MCTS? Using values of +/-inf logits doesn't comport with UCT.

Overall, it's an interesting idea worth thinking about, but changing the scoring to address these issues seems really tricky to do in a way that isn't: a) very ad hoc, b) damaging the "zero"ness of lczero, or c) subtly stunting the performance in the long run. For example, fenchel's nifty idea of penalizing game length (e.g. one could simply discount value by move number at which the game ended during training) seems like it might plausibly combat these problems, but also seems to suffer from a bit of (a), a bit of (b), and plausibly also (c).

(I acknowledge that what I'm discussing is somewhat tangential to the missed mates that actually change the outcome, as Balthazar was pointing out, and Joseph Ellis and Joules Kin were discussing.)

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/900633b6-d740-4542-847d-f9277c938835%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Robert Pope

unread,
Apr 10, 2018, 1:37:22 PM4/10/18
to LCZero
While I don't disagree with your counterpoints, I think at some point it will be very interesting to fork this project and start studying aspects that aren't core to Leela Chess Zero, e.g.
1. Training to favor shorter wins
2. Training to favor wins at the expense of some losses
3. Seeing how far we can take nets of varying sizes
4. Using some input maps that aren't pure "Zero" (e.g. attack maps, check maps)
5. A separate net that is only focused on training endgames
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Юрий Павлович

unread,
Apr 10, 2018, 2:29:37 PM4/10/18
to LCZero
Sigmoid function is more a squashing than scaling function. In theory if x1 > x2 then sigmoid(x1) > sigmoid(x2) for real numbers, but unfortunate on practice due to limited precision of numbers sigmoid output snaps to 1 starting from approximately x = 37 even with double precision. And that leads to a problem: even if network evaluates positions differently(one more favorable than another in logits) the imperfect sigmoid calculation can make them equal. So if raw logits are not suitable for UCT or somehow skews optimization goal then just continue to use squashed values but when it comes to comparison use logits. This will allow engine to continue selecting better moves in thoroughly winning situation. I don't see how it violates "zero" approach as it's more a fix to computer math limitations rather than something related to chess domain. It's like storing rational number as numerator/denominator pair instead of resulting quotient like programmers often do and lose precision.

And I think it's still related to missing mates problem: if there is no better probability to win that 100% then engine wont bother to find more dominating position. When chatting with jkiliani he said that AlphaZero search slightly changed so it might be indeed unrelated until my hands reach the paper in free time to figure out details. I'm just talking from my initial understanding when I started the theme.

Also it is possible that LZ will iron out this problem by itself by realizing that these long dances are useless.

вторник, 10 апреля 2018 г., 22:17:55 UTC+5 пользователь Peter Schmidt-Nielsen написал:

Kevin Clark

unread,
Apr 10, 2018, 2:32:52 PM4/10/18
to LCZero
I think it would be also interesting to tweak the neural architecture of Leela Chess. Standard convnet architectures are pretty well-suited to Go. Although of course the whole board matters, my understanding is that most important patterns are spatially local. But this isn’t true in chess due to sliding pieces. Although certainly a deep enough network can eventually learn these, I imagine it’s fairly difficult to represent with the current architecture that (for example) it’s good for a rook to be on a half-open file, or that it’s slightly dangerous to have your king on h1 when there is a bishop on b7 and few pawns in the way. Perhaps it would benefit the network to have some “sliding piece” convolutions, such as 1x8 or 8x1 filter sizes. It would even possible to do “diagonal” convolutions by rotating the representations as in a rotated bitboard, applying a normal convolution, and then rotating back again.

Юрий Павлович

unread,
Apr 10, 2018, 2:48:38 PM4/10/18
to LCZero
People working with convnets concluded that multiple 3x3 convolutions are better than bigger kernels. It might be that some unusual combinations will work better for chess but the whole "zero" idea is a triumph of general approach to multiple tasks with same architecture. In real world there will be no human nearby to fiddle with robot's brain every time it encounters a new task. You can squeeze additional performance by this in separate project but here authors seems dedicated to sticking with original idea of zero domain knowledge.

Also in Go there often a situation that requires attention spanning whole board. For example a huge group can be killed by move on one side of a board just because you plugged an eye on the other side.

вторник, 10 апреля 2018 г., 23:32:52 UTC+5 пользователь Kevin Clark написал:

Balthazar Beutelwolf

unread,
Apr 10, 2018, 3:39:57 PM4/10/18
to LCZero
Win probability is an inaccurate concept in chess, as you have three possible scores, and you could either try to maximise expected score or the win probability, and maximising either of these two is different. At the top-level in chess, the draw probability is quite high (just look at tcec), but one can fiddle with this as a player by going for sharp and unclear lines or positions that peter out. What I was saying before is that the thing you compare does not need to be a floating point number (that would be rather artificial in itself), but can be something tailor-made to the game. For example, one could have two bits to distinguish between the 3 situations of (i) forced mate for player, (ii) forced mate for for opponent, (iii) neither. And then interpret the remaining bits differently. Even floating point numbers themselves have these parts (sign, exponent, mantissa). Actually, if you use an fp number to store probabilities you only use half the exponent values available to you, so you could use larger exponent values to represent forced mates.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Balthazar Beutelwolf

unread,
Apr 10, 2018, 4:08:40 PM4/10/18
to LCZero
Actually forgot to say that you could also try to minimize your losing probability. Take for example the Ruy Lopez Berlin. 1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6. The 3 main moves for white are 4.0-0. 4.Nc3 and 4.d3. If you look at games results on chessgames365.com then you can maximize your winning probability and play 4.Nc3, maximise your score and play 4.d3, or minimize your losing probability and play 4.0-0.

Peter Schmidt-Nielsen

unread,
Apr 10, 2018, 4:16:33 PM4/10/18
to Balthazar Beutelwolf, LCZero
This is a really good point, I was speaking quite sloppily. The line you've described is a great example of the differences. I meant "expected score" the whole time, but kept saying "win probability" because I'm used to thinking about Go (and LeelaZero), not Chess. If one assumes that maximizing expected score is the goal (a big assumption that one can reasonably question) then my point stands as is, because MSE training the value head to output {-1, 0, 1} is a proper scoring rule, and thus incentivizes the network to reveal its true belief of the expected _score_, and my point about Jensen's inequality on logits stands. If one's goal isn't to maximize expected score then it's more subtle, and requires careful thinking.

Thanks for pointing this out!

On Tue, Apr 10, 2018 at 4:08 PM, Balthazar Beutelwolf <balthazar....@gmail.com> wrote:
Actually forgot to say that you could also try to minimize your losing probability. Take for example the Ruy Lopez Berlin. 1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6. The 3 main moves for white are 4.0-0. 4.Nc3 and 4.d3. If you look at games results on chessgames365.com then you can maximize your winning probability and play 4.Nc3, maximise your score and play 4.d3, or minimize your losing probability and play 4.0-0.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/0de0c055-709a-4eaf-8696-223f561670b7%40googlegroups.com.

renouve

unread,
Apr 10, 2018, 9:15:03 PM4/10/18
to LCZero
Another argument in favor of shorter wins, is that when there is time control, making too extra moves could make you lost on time.

Balthazar Beutelwolf

unread,
Apr 12, 2018, 2:44:01 AM4/12/18
to LCZero
There is another difference between Chess and Go: in Go games last approximately the same number of moves, because after adding so many stones the board is just full, and beforehand it isn't. In Chess this varies more (and without the 50moves rule could even be 1000s of moves long). Thing is: the longer games are the less reliable is your estimated games score. We see this even at the highest level: Stockfish playing Houdini in TCEC11's superfinal went in a couple of games from a winning position to what it thought was a winning endgame, but these were long lines and it got it wrong. This is the reason why to favour shorter wins: if after move X there is a known score from previous games then this tells us more about move X when the games are short - there are fewer moves contributing to the score, less uncertainty.

BTW in Go it would be interesting to train for maximizing expected points difference rather than win probability. This would create a different style of play, especially towards the beginning of training - later I would expect this to converge. This strategy should have a good chance to create a better player in terms of win probability early on, because there is more information gained from a single game, so it can learn faster. It would be interesting to train two AIs using these different objectives (using the same resources) and then pit against another.

Jeremy Zucker

unread,
Apr 12, 2018, 6:43:59 PM4/12/18
to LCZero
Hi folks,

 I am posting to this topic because I played through http://lczero.org/game/3873645 and realized it missed several mate-in-one opportunities, starting at 29. ... Bxf1 instead of 29. ... Nh3++ 
It then played 30. ... Qxf2 instead of 30. ... Qg2++.  It then played 31. ... Qg2+ instead of 31.  ... Bg2++ or 31. ... Qg1++.  Finally, it did play the correct move 32. ... Bxg2++, but only after missing 4 mate-in-1 opportunities!

I am running in CPU mode on a mac with the latest version 5 client_mac. Not sure which weights were used for this game. How would I find that out?

Sincerely,

Jeremy

Joseph Ellis

unread,
Apr 12, 2018, 7:15:52 PM4/12/18
to LCZero
If was a training game, then it is fine as they are played with temperature on anyway.   Although it does bring to mind another idea I had recently: dynamic temperature based upon current % score.  But I digress...


Anyway, the larger net is much better in this regard from what I have seen, so I will probably do one last pass with a net in the 118-121 range, then start testing on the 128x10 nets.

Joseph Ellis

unread,
Apr 15, 2018, 3:31:59 PM4/15/18
to LCZero
I have completed testing of ID118 and ID131.  Wish I could report further progress, but unfortunately, that is not the case.  In fact, there has been a small regression with the newest entries now missing 3 of 13 mates vs 2 of 13 for ID101.


3 problematic mate in 1s:
4r1k1/R5p1/1PN3K1/8/3N2Q1/1P2q3/5r2/8 b - - 3 42
4r1k1/1p4b1/6Q1/3P4/p7/P1B3Pq/1P3r1P/R5K1 w - - 0 27
r1b1k2r/1pp3pp/p1nbp3/8/2BPN2q/4P3/PP3PPP/R1BQ1RK1 b kq - 0 10

The latest executable also provides some more interesting output in how the positions are evaluated.

Should one desire to check evals for various mates, you can grab a build of the executable here:

I also noticed that while information is correctly displayed for mates in the GUI, they (the moves) simply do not appear at all in the generated log files when using -l.  Not sure if this is intended or not...
Reply all
Reply to author
Forward
0 new messages