Value calculations for win/draw/loss (purely theoretical discussion, as probably not feasible to implement)

228 views
Skip to first unread message

Kevin Kirkpatrick

unread,
May 11, 2018, 4:16:40 PM5/11/18
to LCZero
It's my understanding that - following in AZ's path - Leela's value head returns a single number in the range of [0..1].  This single value is then used to train her value network by comparing it to an eventual outcome of loss (0), draw (0.5), and win (1).  I believe the AZ team adopted this approach because it was directly compatible with "pure win-loss" games like Go.  


While this is self-evidently fine when it comes to choosing the best move, I don't think the same can be said with regards to the effectiveness of training.  Training is done with back-propagation: the larger the discrepancy between Leela's value and the actual outcome, the larger the adjustment will be to her network weights.  The basic principle is, "the worse job the network did in understanding the position, the more the network weights should be nudged".  Following that line of reasoning, it seems there might be a powerful optimization that could be made: instead of having the value head return a single number; have it return three numbers: the probability of winning (Pw), probability of draw (Pd), and the probability of losing (Pl).  Or, alternatively (keeping with architecture of 1 value head = 1 output value), have 3 independent value heads: one to assess probability of win, one to assess probability of draw, and one to assess probability of loss (with wrapper function to normalize the 3 outputs such that Pw+Pd+Pl = 1.00).

The three values would only be used only during training; during play (when all that matters is tree search to find best move), they would be combined as a single value V between 0..1 using formula: 

V=(Pw * 1) + (Pd * 0.5) + (Pl * 0)

So from a "choose best move" perspective, a node evaluated as [win=0.2, draw = 0.5, loss = 0.3] would have the same value as [win=0.45, draw = 0.0, loss = 0.55] (both would be valued at 0.45)


However, from a training perspective, there would be a huge difference in the accuracy of [win=0.2, draw = 0.5, loss = 0.3] vs [win=0.45, draw = 0.0, loss = 0.55] for each outcome (where overall MSE is computed as the average MSE of the 3 separate w/d/l predictions)

If Win occurs, result treated as [win=1,draw=0,loss=0]:
MSE of [win=0.2, draw = 0.5, loss = 0.3] = ((1-0.2)^2 + (0-0.5)^2 + (0-0.3)^2) / 3 = 0.33
MSE of [win=0.45, draw = 0.0, loss = 0.55] =  ((1-0.45)^2 + (0-0.0)^2 + (0-0.55)^2) / 3 = 0.20

If Draw occurs, result treated as [win=0,draw=1,loss=0]:
MSE of [win=0.2, draw = 0.5, loss = 0.3] = ((0-0.2)^2 + (1-0.5)^2 + (0-0.3)^2) / 3 = 0.13
MSE of [win=0.45, draw = 0.0, loss = 0.55] = ((0-0.45)^2 + (1-0.0)^2 + (0-0.55)^2) / 3 = 0.5 (horrible prediction!)

If Loss occurs, result treated as [win=0,draw=0,loss=1]:
MSE of [win=0.2, draw = 0.5, loss = 0.3] = ((0-0.2)^2 + (0-0.5)^2 + (1-0.3)^2) / 3 = 0.26
MSE of [win=0.45, draw = 0.0, loss = 0.55] = ((0-0.45)^2 + (0-0.0)^2 + (1-0.55)^2) / 3 = 0.14


With the single-value perspective, either prediction is equally accurate whether outcome is win, draw, or lose.

Conceptually, this ties to the fact that high-skill players recognize a difference between a "roughly equal" game and a "drawn game".   By being limited to only to pick a number from 0 to 1, Leela's neural network cannot be assessed for its ability to make such a distinction; she's only trained to recognize that neither white nor black have any real advantage.   I'd argue that, all things being equal, a NN that can predict win/loss but can't discern a 50/50 probability of the starting position from a 50/50 probability of K+pawn vs K+pawn, simply doesn't have the same depth of understanding of the game as a NN that can do both; and the deeper the overall understanding of the game, the better the NN will be at finding the best moves.  

I also think it's interesting to consider how such values could be incorporated into play.  For example, Leela could be fed a parameter that tells her to play aggressively (choosing moves with slightly lower value but higher chance of win vs draw) or to play for a draw (treating win & draw as equal and just minimizing loss-probability).  She could also have a parameter that lets her make (and accept or decline) draw offers intelligently.  She could even provide more interesting assessments of human-human games by returning not just "who is winning" but also evaluating the overall sharpness of the position (e.g. equal likelihood of black or white winning but with low probability of draw).

Trevor G

unread,
May 11, 2018, 4:55:00 PM5/11/18
to Kevin Kirkpatrick, LCZero
I think this idea is at least technically feasible. It's tough to implement if you're doing some kind of temporal difference learning (training positions based on future position evals), but as is Leela just trains the result of the full game. So yeah, the training data has what it needs, only change would be a slight modification to the value head, and I'm sure a good initialization strategy could be figured out easily enough.

However... I don't know that the training data could or would be able to match actual game results very well. If Leela's going to have other objectives - other than just 1.0W + 0.5D + 0.0L - then I think the very best way to do that is to train her on those other objectives, not try to force her to do something else in-game.

Another idea could be leave everything almost 100% as is, but just add an extra head asking Leela to predict whether or not a given training game  position was drawn - it could even be thrown away during actual game-play. This is a relevant piece of information to train on, and might help Leela better evaluate positions. I take the stance that it's probably better to add inputs and outputs more than take them away - that this is the best way to get Leela to fully understand its training data... But unless this head is actually used as the objective during training, I'm not sure that it's going to really do what we want if used as an in-game objective.


--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/67ed82a3-2f68-44b1-978b-6928308165fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

PeterJaap de Bruin

unread,
May 11, 2018, 5:01:41 PM5/11/18
to LCZero
Thnx for the must-read.

Quoted text:
Conceptually, this ties to the fact that high-skill players recognize a difference between a "roughly equal" game and a "drawn game".   By being limited to only to pick a number from 0 to 1, Leela's neural network cannot be assessed for its ability to make such a distinction; she's only trained to recognize that neither white nor black have any real advantage. 
end Quote


We all play our best chess possible, score the most possible points.
Whether or not to go for a draw, e.g. play a threefold repetition deliberately, should be a basic choice while playing/training/analyzing whatever. 
I keep thinking: "is Leela able to do score half a point (and be happy)? 

Alexander Lyashuk

unread,
May 11, 2018, 5:06:19 PM5/11/18
to kvnkr...@gmail.com, LCZero
Is discussed here: https://github.com/glinscott/leela-chess/issues/241
That's something that will surely be tried later.

Kevin Kirkpatrick

unread,
May 11, 2018, 5:06:32 PM5/11/18
to LCZero
I totally agree, and yes, two heads (a win-vs-loss and a draw-vs-nodraw) would probably be every bit as effective, but also simpler to incorporate.

I also agree (and should've made more clear) that the multi-valued information should absolutely not play any role in competitive game play (e.g. TCEC tournaments).  My trailing paragraph was more along the lines of, "apart from improving training, and while playing no role in standard play, here are some other cool features that could make for fun alternate modes of play).

Kevin Kirkpatrick

unread,
May 11, 2018, 5:08:08 PM5/11/18
to LCZero
LMAO.  Okay, I swear I didn't see that first and then plagiarize the idea to sound smart :-)  Really happy to see that it's been on the radar...

PeterJaap de Bruin

unread,
May 11, 2018, 5:35:35 PM5/11/18
to LCZero
Good to know for the newbies ;)
Reply all
Reply to author
Forward
0 new messages