The current encoding (see below for reference) of the input and output planes are more or less replicated from Alpha Zero, which was a "quick" shot to cover go, chess, and shogi. This works obviously quite well, but there is always room for improvement ...
Questions/potential drawbacks: - The policy head is quite broad with 4672 outputs compared to an average of 26-27 legal moves
- The history planes are heavily correlated as only one (or two) pieces change their position within a ply
- Aggregate planes like fields dominated etc. may help to reduce network size
Initial thoughts on optimization (might be not compatible)
- Encode pieces not by their position, but by there legal moves (including not to move)
- Encoding needs to be 0,1,2 in this case (rook, knight, pawn)
- Castling planes can be removed
- Use a delta encoding for the history planes ( 0 = no change, 1 = piece moved to the position, -1 = piece moved from position)
- Reduce history planes
- Replace value head by - fast - evaluation of most probable among all legal moves.
- within training one would need to replicate a position for all legal moves and the move choosen would be 1 all other 0
- ...
I hope you have crazy good ideas! It is just to stimulate discussion.
( With legal moves encoding a Queen could be encoded as bishop and rook, and a king as a queen with "limited mobility". )
----------------------------
Alpha Zero encoding:
Input:
- The position is encoded with 8x8 planes (current + 7 history planes = 14*8 = 112)
- 6 planes: player 1 pieces: pawn, rook, knight , bishop, queen, king
- 6 planes: player 2 pieces: pawn, rook, knight , bishop, queen, king
- 2 planes: repetition
- Addtional planes are:
- colour (1)
- move count (1)
- player 1 casting (2)
- player 2 castling (2)
- no progess count (1)
Output: - Value: Game result (-1,0,1)
- Policy: Vector with 4672 possible moves