Hi everyone,
Currently, SF's NNUE code pairs each side's king-position with each of that side's pieces and pawns to create a sparse set of features that get input to the NN's first stage.
So, when a king moves, an completely separate set of weights gets triggered for that side. As a result, the NNUE functions as if it were 64 entirely parallel sets of weights.
While this has proved effective, I can't help but wonder whether this is relatively inefficient. The same set of weights gets triggered for a king on f3 regardless of whether it is in the opening or in the ending: and I would expect the number of games required to produce a good net are more defined by how well the training set covers king squares that aren't c1 / e1 / g1 (etc).
Shouldn't far more positional attributes be used to drive SF's NNUE input features?
For example, SF's HCE has a metric that is used to interpolate between mg (middlegame) and eg (endgame) evals. If that same metric were quantized into (say) ten buckets, and then paired with each piece (and the king position) too, I think that would yield ten sets of weights (rather than the current 64) that smoothly yielded from one to the other, and which had significantly better training coverage.
But given that we are trying to get a positional evaluation out of the NN, I can't help but wonder whether high-level pawn attributes would be an even better starting point, e.g.:
- Number of a side's pawns (capped at 8 to allow tricky FENs)
- Number of pawn islands (i.e. 0 to 4)
- Number of pawns blocked by enemy pawns (i.e. to support fortress detection)
For a first attempt, I'd suggest using the capped number of pawns (0..8) with the number of pawn islands (0..4) [I think there are only 27 valid combinations], combined with each piece and pawn (and king) in turn.
Is this something people have discussed already?
Cheers, Nick