Suggestion

491 views
Skip to first unread message

GK

unread,
Apr 14, 2018, 9:53:07 PM4/14/18
to LCZero
In an earlier thread, a user named Sam Jukes posted this game on Lichess (LCZ vs another engine): https://lichess.org/study/fPvlEksE

If you click the computer analysis section and look at it, LCZ is at a massive advantage (approximately +9 cp) at about move 37, yet for at least 60 moves, it is unable to convert this advantage into a win.

My proposal is thus as follows: If LCZ has measured its evaluation at a particular CP advantage, say, past +5 cp, at move 55 (or whatever move number is generally thought to be indicative of the endgame, I just picked a random one), but 3 moves later, at move 58, the evaluation is more or less the same (or perhaps has dropped significantly), LCZ should restart at that move 55, but with a different move than was originally chosen. (This is essentially a sort of recursion). It should then repeat this process as is necessary. Note that I am not sure how the network weights would factor into this kind of setup.

This way, rather than count on the fact that the same 54 prior moves will eventually be played in future games and that LCZ will eventually learn to make the right move, we can try and hasten the process in terms of converting an advantage and learning the endgame in the current game itself.

Thoughts?
Message has been deleted

Jesse Jordache

unread,
Apr 15, 2018, 2:47:35 AM4/15/18
to LCZero
1, that would take FOREVER
2: I think the idea is that the computer's evaluation of the position may not be accurate, and/or the exploitation of the advantage may not be accurate, and as the engine gets stronger the two will meet in the middle, and
3. there are too many cases where a real advantage takes a long time to convert; fortresses that need to be broken down, stalemate defenses avoided, fortresses that cannot be broken down (7th rank pawn vs queen for example), or just really long mating nets that the computer can't realistically be expected to find at this point in its learning growth.

Besides, I think they're trying to avoid reaching in and giving it chess-specific rules to follow.  I'm personally not entirely sure the increased temperature in the opening moves is the best approach, but we'll see.

[edit: I didn't realize when I first read this that you were referring to stockfish's evaluation of the position.  That's like... upsetting the apple cart that you've filled with oranges, and then closed the barn door on after the horses have already gotten away.]

Balthazar Beutelwolf

unread,
Apr 15, 2018, 5:48:36 AM4/15/18
to LCZero
Not converting a clear advantage quickly is less of a concern than not converting it at all, or mis-evaluating a position as won when it is not. One could take account of it in training by adding training games to positions whose evaluation was in contrast to the actual score; instead of starting every training game in the starting position. One annoyance here is that there seems to be no distinction between a 50% expected score resulting from an unclear position, or one from a dead draw, so leela keeps track of the expected score but not the expected win/draw rate.  This means it would be less often aware that a misevaluation had taken place.

GK

unread,
Apr 15, 2018, 2:12:07 PM4/15/18
to LCZero
I will clarify:

Past certain depths (such as say move 50+ or 75+ or 100+ - whatever number it is, it is definitely considered "endgame") and given LCZ's CP Evaluation (not SF's, that was merely to illustrate a point and have a sort of objective outside view to clarify that LCZ was stalling), if LCZ's cp eval (at whatever value is considered an unbeatable advantage) remains constant or drops dramatically, then rather than continue onwards, the game would move back to the move in question and try something different than what was originally played.


GK

unread,
Apr 15, 2018, 3:05:43 PM4/15/18
to LCZero
Some pseudocode, if you will.
Nothing final, just getting thoughts out...

(All Normal Stuff Happening)
When move 100 (arbitrary value indicating endgame) comes around and expected score percentage is 75% (again arbitrary). Another condition could also be that the cp eval is very high or low (abs>100 or 150 or whatever - these values are again arbitrary):
Board position before white moves at Move 100 stored, perhaps in FEN format
Move 100: white moves and cp (centipawn) eval of white stored in array[0]. Black moves.
Move 101: white moves and  cp eval of white again stored but this time in in array1]. Black moves.
Move 102: white moves and cp eval of white again stored but this time in in array[2]. Black moves.
If array[0]==array[1]==array[2] (or perhaps equivalent enough within a small enough margin), then
restart at the FEN position obtained at Move 100
Let normal MCTS Search happen (from that same position, of course)
Note that this means the same move could perhaps be chosen, in which case this would of course repeat
Eventually, the move that was chosen would make a significant difference, whether as a "good" move (cp eval increases) or a "bad" move (cp eval decreases)

Of course, one way to also do this, would be every three or five or ten or however many moves, see if the cp eval is high enough that it would be indicative of a win but has still not changed BUT as others have pointed out above, at the beginning of the game, this would be somewhat silly as the games are generally drawn and constant for a while anyways. Having said that, it might be more viable past a certain point (ie, every three moves past move 75 or 100 or whatever).

Note that the "pseudocode" I wrote before has one major difference from what I have written before. That is, rather than store the played move and making sure it is not played again, we simply default back to the standard MCTS.

With enough of these sort of games occurring, eventually when the weights are calculated or updated every four hours on Error323's computer, there should be a significant difference in the ability of the network to convert its advantage to a win. We should also see considerably quicker games as rather than proceed down paths that lead nowhere, LCZ will be able to choose more efficient routes, allowing more games to be played in the same amount of time given the same amount of processing power (eventually, although it will likely take longer at first).

As a final point, the amount of human "interference" in this would still be negligible as the positions that Leela would reset at would not be forcefully given to her from the outside (i.e. by people). They would be whatever board positions LCZ had already generated and arrived at of its accord.

evalon32

unread,
Apr 15, 2018, 3:42:01 PM4/15/18
to LCZero
I don't know if you're aware, but the cp eval in Leela is simply a proxy for the expected score. That is, it doesn't have a cp eval in the traditional sense, but folks came up with a function that maps expected score percentages to "cp" numbers, because that's what various chess GUIs expect.

GK

unread,
Apr 15, 2018, 4:40:29 PM4/15/18
to LCZero
Thank you for the correction.

My statement would then be: perhaps if the expected score % remains unchanging despite being 75% or higher past move 50 (again the exact numbers are in the air).

Also, although this idea in general might be reasonable, I am not too sure if there is a need. Because although this particular game lasted quite a few moves, LCZ's temperature=1 sort of prevents this from really happening in its self-play games (as far as I am aware).

I am also not really sure how to collect the number of moves in each game to check how often this issue occurs. If we were able to obtain this data, we could probably put some numbers to this problem and determine whether it is worth acting on or whether we can just keep training and let this issue iron itself out by virtue of simply playing more games.
Reply all
Reply to author
Forward
0 new messages