Shouldn't positional attributes drive SF's NNUE input features (rather than king position)?

452 views
Skip to first unread message

Nick Pelling

unread,
Jan 10, 2021, 5:01:20 AMJan 10
to FishCooking
Hi everyone,

Currently, SF's NNUE code pairs each side's king-position with each of that side's pieces and pawns to create a sparse set of features that get input to the NN's first stage.

So, when a king moves, an completely separate set of weights gets triggered for that side. As a result, the NNUE functions as if it were 64 entirely parallel sets of weights.

While this has proved effective, I can't help but wonder whether this is relatively inefficient. The same set of weights gets triggered for a king on f3 regardless of whether it is in the opening or in the ending: and I would expect the number of games required to produce a good net are more defined by how well the training set covers king squares that aren't c1 / e1 / g1 (etc).

Shouldn't far more positional attributes be used to drive SF's NNUE input features?

For example, SF's HCE has a metric that is used to interpolate between mg (middlegame) and eg (endgame) evals. If that same metric were quantized into (say) ten buckets, and then paired with each piece (and the king position) too, I think that would yield ten sets of weights (rather than the current 64) that smoothly yielded from one to the other, and which had significantly better training coverage.

But given that we are trying to get a positional evaluation out of the NN, I can't help but wonder whether high-level pawn attributes would be an even better starting point, e.g.:
  • Number of a side's pawns (capped at 8 to allow tricky FENs)
  • Number of pawn islands (i.e. 0 to 4)
  • Number of pawns blocked by enemy pawns (i.e. to support fortress detection)
For a first attempt, I'd suggest using the capped number of pawns (0..8) with the number of pawn islands (0..4) [I think there are only 27 valid combinations], combined with each piece and pawn (and king) in turn.

Is this something people have discussed already?

Cheers, Nick

Joost VandeVondele

unread,
Jan 10, 2021, 6:49:12 AMJan 10
to FishCooking
There has been some discussion (discord) and first attempts to change the features used. It is a pretty non-trivial exercise, as it requires training a new net from scratch, and quite some implementation work. Currently, just reproducing the master net within 10-20Elo is non-trivial. So, new features should bring, say, 20 Elo, to beat master. To achieve that, they will need to be pretty generic, I think. Having said that, changing net architecture, including inputs, is the most promising way to improve eval right now, so these ideas are certainly welcome. 

Dieter Dobbelaere

unread,
Jan 10, 2021, 6:57:52 AMJan 10
to FishCooking
Interesting thoughts! However, I fear that some of your suggestions might already be discovered by the training process (if it decides they are useful):

- Number of a side's pawn: fits nicely into linearity (before activation) of first affine transformation, so training has all the possibilities to naturally take one hidden feature as this quantity.
- Number of pawn islands: similar thing can be imagined (training has possibility to boost connected pawn weights via "constructive interference").
- Number of pawns blocked by enemy pawns: same...

Let's not underestimate the power of neural network training (and the things/patterns it can figure out on its own). Also, the addition of input features would require more dedicated CPU cycles...

Op zondag 10 januari 2021 om 12:49:12 UTC+1 schreef Joost VandeVondele:

Nick Pelling

unread,
Jan 10, 2021, 8:23:37 AMJan 10
to FishCooking
Hi Joost,

Glad to hear that people are discussing this. Given that the basic net topology is largely a given for NNUE, it would seem to be sensible to test if other (more chess-y than Shogi-y) ways of specifying the input features can yield stronger nets. I'll head over to the discord, see what's going on there. :-)

Cheers, Nick

Nick Pelling

unread,
Jan 10, 2021, 8:33:15 AMJan 10
to FishCooking
Hi Dieter,

All these things (and many more) are no doubt discovered by the training process already: however, because the input features are currently created by combining a king square with a piece/pawn  square, my strong suspicion is that this particular choice of input features (which is effectively training a separate net for each king position) is not making best use of the NNUE. A better choice of input features would train faster, play better, and have fewer blind spots, so why not try it?

Cheers, Nick

Jörg Oster

unread,
Jan 10, 2021, 9:20:18 AMJan 10
to FishCooking
But isn't exactly this choice of input features, king piece placement, which makes the Neural Net Efficiently Updatable?

Dieter Dobbelaere

unread,
Jan 10, 2021, 9:24:59 AMJan 10
to FishCooking
Hi Nick,

I agree that the view that "a separate net is trained for each king position" is a somewhat close representation of reality (although you have to take into account that all those little nets are intertwined by construction, i.e. they share the same final layers).

On the other hand, I have a feeling that a sparse representation like this (with a highly overdimensioned input feature space) is what "makes it works", as we only have a limited number of layers at our disposal (three fully-connected layers, to be exact) and we really need the high-dimensional first layer (~ 10M weights) to distinguish lots of different features/patterns.

Compare this with LeelaChessZero, where a non-sparse representation is used. If we ignore the history planes for a moment, LCZero's input features are piece bitboards...
However, LCZero net are very deep (~60 layers (30 resnet blocks) + output heads) and have convolutional filter elements (to extract piece movements) that make up for that...

The proof of the pudding is in the eating though, and every proposition is definitely worth investigating...

Best regards
Dieter

Op zondag 10 januari 2021 om 14:33:15 UTC+1 schreef nickpe...@gmail.com:

Dieter Dobbelaere

unread,
Jan 10, 2021, 9:32:58 AMJan 10
to FishCooking
Hi Jörg,

Not necessarily I guess.

Let's say the input features are piece bitboards along with a side-to-move binary input (I'm not saying this is a great idea though, see above...).
Then a make/unmake move will only change a few input features, making it still "efficiently updatable".

Best regards
Dieter

Op zondag 10 januari 2021 om 15:20:18 UTC+1 schreef Jörg Oster:

Nick Pelling

unread,
Jan 11, 2021, 4:05:04 PMJan 11
to FishCooking
A feature driver metric based on number of pawns and/or number of pawn islands would only change when a pawn is captured (always) or makes a capture (sometimes). That's just about as easily updateable as king moves, as far as I can see.

Nickolas

unread,
Jan 17, 2021, 3:15:01 PMJan 17
to FishCooking
It's really hard to guess a priori what input features "should" benefit the NNUE evaluation. One of its great strengths is that the input layer is heavily overparameterized, which is not only what makes it efficiently updateable, but also what ensures that many of the calculated features used by the classical evaluation -- things perceived as important by humans -- are somehow encoded in the network during training, in a roundabout fashion. It's certainly not obvious that spending cycles pre-calculating fancier inputs should yield better results than, say, deepening or widening (or otherwise changing the architecture) of the hidden layers. No doubt some combination of all of the above is best, but it will require lots of human effort and electricity to inch our way closer to the ideal.
Reply all
Reply to author
Forward
0 new messages