Learning first with fewer pieces?

127 views
Skip to first unread message

ML proof

unread,
Aug 13, 2022, 2:39:04 PM8/13/22
to LCZero
Let's say we have a pool of strong fast engines that were all trained from scratch. By making these engines play many times (with each other) we get positions that are likely to arise, not just any random positions. The purpose of having different engines is to have different playing styles, which would perhaps allow for better learning about likely positions; however, one engine could be quite enough.
Could we make the computer learn what are the likely positions, and then transfer this knowledge to better train a new engine?
Below are two possibilities to apply this idea.
1. LCZero would learn by self-playing also from likely positions with fewer pieces, instead of self-playing only from the standard starting position. Over time, the proportion of standard starting positions would be gradually increased.
2. LCZero or another program would learn completely backwards how to play. The program would first learn by self-playing in likely positions with less than N pieces. At the end of the learning phase numbered N, it would start self-playing from likely positions with N+1 pieces to learn how to play in likely positions with N+1 pieces (without changing the way it plays positions with less than N pieces). It would start with N=2, then N=3, 4, 5 ...  and so backwards to the standard starting position with N=32 pieces.

Have any similar idea been explored? What are the results?

esch...@gmail.com

unread,
Aug 15, 2022, 10:45:57 AM8/15/22
to LCZero
As far as I know, the only similar attempt is with dkappe's endgame nets.  These are nets that were only trained on positions with 12 or fewer pieces, and they played reasonably well in opening positions as well.  I don't think they were as strong as the lc0 team nets, but that doesn't prove anything one way or another, since the endgame nets were not trained to the same extent.

I think it's an idea worth investigating, and I hope to do something similar once I get some more capable hardware.  IMO, it's hard for anyone besides the Lc0 team to be able to toss enough resources at it to tell if it is a better or worse approach, though.

Reply all
Reply to author
Forward
0 new messages