Understanding CPuct settings in Lc0 v0.19.1

1,044 views
Skip to first unread message

Jupiter

unread,
Dec 20, 2018, 3:58:51 PM12/20/18
to LCZero

Plotted the default and other settings on cpuct which consist of cpuct initial value, the base and the factor. The Y-axis refers to the final_cpuct value that will be applied in MCTS, after calculations on the 3 cpuct settings.


option name CPuct type string default 3.000000

option name CPuctBase type string default 19652.000000

option name CPuctFactor type string default 2.000000


As nodes increases especially for long TC, the final_cpuct value also increases.




I run 2 matches using net id 32106. No TB, no win adjudication, but with draw adjudication using cutechess-cli.


1. TC 15s + 0.1s

Default in blue vs new setting in orange


Orange:

CPuct = 3.0

CPuctBase = 49652

CPuct = 2.0


Result

   # PLAYER                                   :  RATING  ERROR  POINTS  PLAYED   (%)

   1 Lc0 v0.19.1 32106 cpuct_3.0_49652_2.0    :    22.4   34.4    18.0      32    56

   2 Lc0 v0.19.1 32106 cpuct_def              :   -22.4   34.4    14.0      32    44


TC 15s games



2. TC 300s + 2s or 5m + 2s

Default in blue vs new setting in orange


This match takes a lot of time because Lc0 was trolling. To save time, win adjudication should be enabled.


Result

   # PLAYER                                   :  RATING  ERROR  POINTS  PLAYED   (%)

   1 Lc0 v0.19.1 32106 cpuct_3.0_49652_2.0    :    13.6   26.4    14.0      26    54

   2 Lc0 v0.19.1 32106 cpuct_def              :   -13.6   26.4    12.0      26    46


TC 300s games



From this small samples both in short and longer TC, the new setting is slightly better. The difference vs the default is that it has a lower final_cpuct value within the node boundary (green and violet) at different TC's.



Match games conditions:

Start pgn: 2moves_LT_1000.pgn

Each opening is played (side reversed)


It would be interesting to test that yellow line at [3.0, 79652, 1.5]



This is the frequency distribution of nodes on the two TC used, to see at which number of nodes has dominated the search given the gpu used.



John D

unread,
Dec 20, 2018, 4:59:11 PM12/20/18
to LCZero
So for LTC/high nodecount games should cpuct be reduced to A0s 2.5?

LuckyDay

unread,
Dec 20, 2018, 9:39:37 PM12/20/18
to LCZero
if anything i think ltc/higher node count games seem to do better with higher cpuct e.g. 3.0-3.4;
in all my testing of 800 nodecount games A0's 2.5 cpuct base setting has always seemed to perform best with those nodes. Imo, ideally cpuct initial should be kept constant at 2.5 and then the scaling parameters should be optimised to fit the best cpuct curve with increasing nodes. It does seem that this might potentially require more than just experimenting with different cpuct base or factor settings and maybe using different log (or other curve) settings as well might be necessary instead of just ln.
clopping settings for each individual time control will definitely shed light on what the optimal curve should be like and then hopefully we can derive a mathematical equation to best fit that curve.

Jupiter

unread,
Dec 20, 2018, 10:49:00 PM12/20/18
to LCZero
in all my testing of 800 nodecount games A0's 2.5 cpuct base setting has always seemed to perform best with those nodes.

Couple of days ago, I found out that also, cpuct 1.5, 2, 2.5 are good on fixed nodes searched only even up to 3200 nodes. But when TC is used, it is no longer the case. 3.2 is better. That was for v0.18.1 with net id 11258 though.




Clop requires a lot of games to converge. And after the clop session, you also need to test its resulting parameters in actual games. But this optimization is also better when the default is not properly tested.

I will continue the test still with initial cpuct = 3 and further reduce the factor and increase the base so that the final_cpuct will be closer to 4. But will start first on the yellow one in the plot at my first post. cpuct = [3.0, 79652, 1.5]

Jupiter

unread,
Dec 21, 2018, 9:28:22 AM12/21/18
to LCZero
Run more tests on TC 15s+0.1s extending the previous test (orange) from 32 games to 100 games, and create new test (yellow) from new cpuct settiings.

Orange is good at 54% vs the default setting after 100 games.

    Figure 3


Match results

Summary table:

   # PLAYER                                   :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Lc0 v0.19.1 32106 cpuct_3.0_49652_2.0    :    21.2   21.8    54.0     100    54
   2 Lc0 v0.19.1 32106 cpuct_def              :    -7.1   14.6    97.0     200    49
   3 Lc0 v0.19.1 32106 cpuct_3.0_79652_1.5    :   -14.1   21.3    49.0     100    49

Head to head statistics:

1) Lc0 v0.19.1 32106 cpuct_3.0_49652_2.0  21.2 :    100 (+17,=74,-9),  54.0 %

   vs.                                          :  games (  +,  =, -),   (%) :    Diff,    SD, CFS (%)
   Lc0 v0.19.1 32106 cpuct_def                  :    100 ( 17, 74, 9),  54.0 :   +28.3,  15.5,   96.6

2) Lc0 v0.19.1 32106 cpuct_def            -7.1 :    200 (+17,=160,-23),  48.5 %

   vs.                                          :  games (  +,   =,  -),   (%) :    Diff,    SD, CFS (%)
   Lc0 v0.19.1 32106 cpuct_3.0_49652_2.0        :    100 (  9,  74, 17),  46.0 :   -28.3,  15.5,    3.4
   Lc0 v0.19.1 32106 cpuct_3.0_79652_1.5        :    100 (  8,  86,  6),  51.0 :    +7.1,  14.9,   68.2

3) Lc0 v0.19.1 32106 cpuct_3.0_79652_1.5 -14.1 :    100 (+6,=86,-8),  49.0 %

   vs.                                          :  games ( +,  =, -),   (%) :    Diff,    SD, CFS (%)
   Lc0 v0.19.1 32106 cpuct_def                  :    100 ( 6, 86, 8),  49.0 :    -7.1,  14.9,   31.8


Engine naming format:

Lc0 v0.19.1 32106 cpuct_3.0_49652_2.0

32106 = net id
3.0 = CPuct
49652 = CPuctBase
2.0 = CPuctFactor

Get the games from link below.


On Friday, December 21, 2018 at 4:58:51 AM UTC+8, Jupiter wrote:
Reply all
Reply to author
Forward
0 new messages