Paper by AlphaGo team for tuning hyperparameters

Lyudmil Antonov

unread,

Dec 21, 2018, 7:15:27 AM12/21/18

to LCZero

This paper

https://arxiv.org/pdf/1812.06855.pdf

reveals some behind-the-scene Bayesian methods for tuning hyperparameters of the NN and MCTS.

Owen W

unread,

Dec 21, 2018, 8:02:52 AM12/21/18

to LCZero

very interesting indeed. now to incorporate it

Lyudmil Antonov

unread,

Dec 21, 2018, 3:58:25 PM12/21/18

to LCZero

There is a Python module BayesianOptimization. Plug in a suitable function / testing and there you go ...

Owen W

unread,

Dec 21, 2018, 4:26:40 PM12/21/18

to LCZero

If they are not using this on Lc0 they should be.

LuckyDay

unread,

Dec 21, 2018, 10:13:58 PM12/21/18

to LCZero

Is it that much different to remi coulom's CLOP optimisation (https://www.remi-coulom.fr/CLOP/)? That is already being used for leela's optimisation.

I personally think what would be very useful would be to utilise the SPSA tuning which is what stockfish uses, since that converges pretty quickly even with multiple parameters, while CLOP can't really handle more than 3 parameters. Only downside is it can only handle self-play tuning and cannot be used to tune vs ab engines.

https://github.com/snicolet/spsa

Owen W

unread,

Dec 21, 2018, 10:56:25 PM12/21/18

to LCZero

Well in an Aplpha Zero paper "Bayesian Optimization in AlphaGo"

page 6 of the paper:

4. Conclusion

"Bayesian optimization provided an automatic solution to tune the game-playing hyper-parameters of AphaGo. This would have been impossible with traditional hand-tuning. Bayesian optimization contributed significantly to the win-rate of AlphaGo, and helped us gain important insights which continue to be instrumental in the development of new versions of self-play agents with MCTS."

Message has been deleted

Lyudmil Antonov

unread,

Dec 22, 2018, 8:54:33 AM12/22/18

to LCZero

The problem with SPSA and CLOP is that they work with smooth objective functions (gradient descent, Hessian, and other functions involving differentiation). Additionally, CLOP can handle only few parameters as you say. Genetic algorithms (such as differential evolution) work better in a situation with arbitrary (non-smooth, non-convex) functions and they can handle well discrete parameters (as are those in Stockfish) but they are too expensive in the presence of noise. Bayesian optimization doesn't have requirements towards the objective function and can handle noise decently.

Lyudmil Antonov

unread,

Dec 22, 2018, 9:03:54 AM12/22/18

to LCZero

I was able to adapt my existing python script for differential evolution to work with Bayesian optimization. Now I am comparing the 3 variants of Bayesian (Expected Improvement, Knowledge Gradient, and Predicted Entropy Search). Although the AlphaGo team used exclusively EI, I find that KG works better. Haven't tried PES yet.

On Saturday, December 22, 2018 at 9:57:42 AM UTC+2, Jupiter wrote:

Nice info thank you.

In deepmind's algorithm, it is like hand-tuning, modify the param value, play matches vs the default, the result of the match is then feed into the optimizer and the optimizer will suggest the next param value to try.

Creating matches (takes time ***) and calculating results is easy. Now it is time to see the input and output of that optimizer. There are examples in that site you quote. It is interesting indeed to look deeper into it.

*** Deep mind uses 50 games? and send the result to optimizer to get the next suggested param value to try next.

Message has been deleted

Lyudmil Antonov

unread,

Dec 22, 2018, 9:26:49 AM12/22/18

to LCZero

Yes, it was very easy to do as I had all the needed functions in my script (https://github.com/lantonov/Optimisation/blob/master/pentade.py). Additionally, the objective function based on the pentanomial model is reasonably smooth. I will put the bayesopt.py on the above site when ready.

On Saturday, December 22, 2018 at 4:04:39 PM UTC+2, Jupiter wrote:

Get it now based from sample jupyter notebook on that site.

Basically just revise the black_box_function() to return the match result. Relevant changes below.

def black_box_function(CPuct, CPuctBase, CPuctFactor):
"""Function with unknown internals we wish to maximize.

This is just serving as an example, for all intents and
purposes think of the internals of this function, i.e.: the process
which generates its output values, as unknown.
"""

# Comment out the function from the sample
# return -x ** 2 - (y - 1) ** 2 + 1

# Introduce a new system to return values. Return the perf ((wins+draws/2)/(wins+draws+loses))
# from result in playing the test engine vs the default engine
num_games = 50

# Run game matches between the test engine with new param vs default engine with default param
perf = match_engine(test_engine, CPuct, CPuctBase, CPuctFactor, default_engine, max_games = num_games)

# Bayesian optimizer will maximize the perf
return perf

Create a new function that will match the engine

def match_engine(test_engine, CPuct, CPuctBase, CPuctFactor, default_engine, max_games = num_games) :
"""
Return the perf from the point of view of test engine
Cutechess can be called to create the matches and parse the results
"""

# Score of test_engine vs default engine: 12 - 1 - 19 [0.672] 32

# Example just return the score rate of test_engine
perf = 0.672
return perf

Then also revise the bounds.

# Bounded region of parameter space
pbounds = {'CPuct': (2, 5), 'CPuctBase': (1000, 100000), 'CPuctFactor': (1, 3)}

On Saturday, December 22, 2018 at 3:57:42 PM UTC+8, Jupiter wrote:

Nice info thank you.

In deepmind's algorithm, it is like hand-tuning, modify the param value, play matches vs the default, the result of the match is then feed into the optimizer and the optimizer will suggest the next param value to try.

Creating matches (takes time ***) and calculating results is easy. Now it is time to see the input and output of that optimizer. There are examples in that site you quote. It is interesting indeed to look deeper into it.

*** Deep mind uses 50 games? and send the result to optimizer to get the next suggested param value to try next.

On Saturday, December 22, 2018 at 4:58:25 AM UTC+8, Lyudmil Antonov wrote:

Lyudmil Antonov

unread,

Dec 22, 2018, 9:42:29 AM12/22/18

to LCZero

My script is suitable for Stockfish while yours is suitable for Lc0. Also, game playing is inside my script without the need to call cutechess and the result is based on the pentanomial model instead of win rate.

Lyudmil Antonov

unread,

Dec 22, 2018, 11:58:16 AM12/22/18

to LCZero

I added a Jupyter notebook with the preliminary version of the bayesopt script (https://github.com/lantonov/Optimisation/blob/master/bayesopt.ipynb). It will work with a tuning version of Stockfish and adjusting the path of the parameters and bounds file. Also Syzygy tablebases.

Owen W

unread,

Dec 22, 2018, 12:09:24 PM12/22/18

to LCZero

wow that was pretty quick!

Message has been deleted

Owen W

unread,

Dec 24, 2018, 6:28:13 AM12/24/18

to LCZero

So it shows signs of working then?

On Monday, December 24, 2018 at 3:09:53 AM UTC-5, Jupiter wrote:

Finally tested this optimization at google cloud using Tesla V100 gpu.

I used:

optimizer.maximize(
init_points = 1,
n_iter = opti_iter,
acq = "ei",
xi = 0.05
)

acq = "ei", is Acquisition Function "Expected Improvement"

Max iteration: 20
Param bounds: {'CPuctFactor': (0.5, 3.0), 'CPuct': (2.5, 4.5)}
Best of 32 games per iteration, net id 32182. (Deepmind used 50 games)

Games are run at TC 3s+0.05s

After around 1 hr of tuning, I get this.

| iter | target | CPuct | CPuctF... |
-------------------------------------------------

...
...

| 14 | 0.438 | 4.5 | 1.326 |
try parameters = CPuct 2.5 CPuctFactor 2.1631013357249707
processing ...
score: 0.422
| 15 | 0.422 | 2.5 | 2.163 |
try parameters = CPuct 3.0930171934216246 CPuctFactor 3.0
processing ...
score: 0.594
| 16 | 0.594 | 3.093 | 3.0 |
try parameters = CPuct 3.106287110348023 CPuctFactor 1.7459954705321463
processing ...
score: 0.5
| 17 | 0.5 | 3.106 | 1.746 |
try parameters = CPuct 4.498966652249406 CPuctFactor 2.276217104711355
processing ...
score: 0.453
| 18 | 0.453 | 4.499 | 2.276 |
try parameters = CPuct 4.0449655865241585 CPuctFactor 1.7403143498944391
processing ...
score: 0.5
| 19 | 0.5 | 4.045 | 1.74 |
try parameters = CPuct 3.007779360262849 CPuctFactor 2.717663047532658
processing ...
score: 0.391
| 20 | 0.391 | 3.008 | 2.718 |
try parameters = CPuct 3.5571016382646956 CPuctFactor 1.8782503551645444
processing ...
score: 0.516
| 21 | 0.516 | 3.557 | 1.878 |
=================================================
Done!!
best param: {'params': {'CPuctFactor': 3.0, 'CPuct': 3.0930171934216246}, 'target': 0.594}

The target 0.594 is the best score rate found after 21 iterations. The iteration is 21 because the first iteration was done randomly so that the tuner will have an idea of the next param values to try.

The Param bounds: {'CPuctFactor': (0.5, 3.0), 'CPuct': (2.5, 4.5)} would ensure that the optimizer will only try around those values.

Run a real 50 game match at TC 15s+0.1s between the default vs the Tuned values
Default: CPuct = 3.0, CPuctFactor = 2.0
Tuned: CPuct = 3.09, CPuctFactor = 3.0
Net id: 32182

Result
Score of Lc0 v0.19.1 32182 cpuct_def vs Lc0 v0.19.1 32182 cpuct_3.09_19652_3.0: 5 - 7 - 38 [0.480] 50
Elo difference: -13.90 +/- 47.32
Finished match

Tuned values leads by 2 points.

It is important to note that the tuning was done at TC 3s+0.05s, and the above match is based on TC 15s+0.1s. Perhaps if the tuning was done close to TC 15s+0.1s and with high number of iterations, and with higher tuning matches per iteration, it might find a much better param values.

I used the script to run engine matches from this repo, https://github.com/snicolet/spsa, which is also based on the cutechess-cli.

On Friday, December 21, 2018 at 8:15:27 PM UTC+8, Lyudmil Antonov wrote:

Lyudmil Antonov

unread,

Dec 24, 2018, 7:07:11 AM12/24/18

to LCZero

Very good. In my experience the time controls have a very big (maybe decisive) influence on the quality of the tuned values. Tuning on time controls is noisier and less efficient than tuning on depth (min depth 11 plies).

Message has been deleted

Lyudmil Antonov

unread,

Dec 24, 2018, 8:54:22 AM12/24/18

to LCZero

Optimizing the imbalance parameters in SF which are highly correlated, I got on iter 77 with depth = 11 plies and 100 games (50 pairs) per iteration:

>>> optimizer.max

{'target': 365.9431944347234, 'params': {'QuadraticOurs[0][0]': 1438.9340480253704, 'QuadraticOurs[1][0]': 40.73426527521871, 'QuadraticOurs[1][1]': 33.03773906666071, 'QuadraticOurs[2][0]': 30.06857466351312, 'QuadraticOurs[2][1]': 252.8165400187659, 'QuadraticOurs[2][2]': -57.14261454389392, 'QuadraticOurs[3][1]': 108.12815611546546, 'QuadraticOurs[3][2]': 5.351148054822657, 'QuadraticOurs[4][0]': -25.615796846117085, 'QuadraticOurs[4][1]': -2.551244941105892, 'QuadraticOurs[4][2]': 51.61157169458661, 'QuadraticOurs[4][3]': 109.25112842901744, 'QuadraticOurs[4][4]': -203.8569204235588, 'QuadraticOurs[5][0]': -193.17437042232336, 'QuadraticOurs[5][1]': 25.5033820075391, 'QuadraticOurs[5][2]': 117.15425656512735, 'QuadraticOurs[5][3]': 133.31588550161015, 'QuadraticOurs[5][4]': -129.5249067575345, 'QuadraticOurs[5][5]': -4.0689490566161695, 'QuadraticTheirs[1][0]': 34.08302025144014, 'QuadraticTheirs[2][0]': 7.34191098426796, 'QuadraticTheirs[2][1]': 66.50243725497057, 'QuadraticTheirs[3][0]': 60.74699178999446, 'QuadraticTheirs[3][1]': 65.80327564218653, 'QuadraticTheirs[3][2]': 40.83262401149435, 'QuadraticTheirs[4][0]': 45.79536033940076, 'QuadraticTheirs[4][1]': 41.17852666420923, 'QuadraticTheirs[4][2]': 23.374649746074088, 'QuadraticTheirs[4][3]': -23.412051827426815, 'QuadraticTheirs[5][0]': 94.20454413176627, 'QuadraticTheirs[5][1]': 99.40477219339073, 'QuadraticTheirs[5][2]': -46.31152830068628, 'QuadraticTheirs[5][3]': 138.8127906597607, 'QuadraticTheirs[5][4]': 270.25101221274736}}

In my objective function, 365.94 is approximately 3.66 sigma corresponding to 99.987% likelihood of superiority (LOS)

On Monday, December 24, 2018 at 3:20:24 PM UTC+2, Jupiter wrote:

The CPuct and CPuctFactor tuned is indeed very dependent on TC as the final_cpuct is also a function of nodes and nodes of Lc0 varies a lot, usually higher at long thinking time.

final_cpuct = CPuct + CPuctFactor * natural_log((nodes+CPuctBase)/CPuctBase)

I just keep the CPuctBase constant.

TC 15s+0.1s on Tesla V100 is only around 8K average nodes per move. And optimization is only done at TC3s+0.05s. So basically I am only testing the surface. At deeper depths or more nodes, anything can happen.

I will try this on A/B engines.

Will share the script later, it is not user friendly at the moment, will refactor this and create a setting file for user convenience.

Owen W

unread,

Dec 24, 2018, 9:00:24 AM12/24/18

to LCZero

didn't A0 continuously keep optimizing every 50 games or something?

Lyudmil Antonov

unread,

Dec 24, 2018, 9:14:31 AM12/24/18

to LCZero

I don't know about A0 but from this AlphaGo paper it seems that they routinely use 50 games per iteration. Using the 0.5 step for the win rate function, it means that the granularity of their objective function is 0.01.

From what I observed in the past day, once the algorithm senses the maximum it very cleverly goes along the ridge, even if the ridge is not straight.

Lyudmil Antonov

unread,

Dec 24, 2018, 9:18:17 AM12/24/18

to LCZero

Meaning that it pays to start with exploration and once the maximum is found, go over to exploitation.

Owen W

unread,

Dec 24, 2018, 9:24:04 AM12/24/18

to LCZero

Well for sure you are way more versed than I am, but it sounds like you have shown that it is beneficial?

Message has been deleted

Lyudmil Antonov

unread,

Dec 24, 2018, 12:36:49 PM12/24/18

to LCZero

Not now but I can make it easily and put it on a shared drive.

On Monday, December 24, 2018 at 4:58:24 PM UTC+2, Jupiter wrote:

More games per iter is indeed better. The score rate is more accurate and would make the optimizer looks good converging the values faster.

BTW do you have an SF exe compile where the piece value of N, B, R and Q are exposed for tuning? Not necessarily the latest Sf code. I plan to tune it with my script to see if the script would work.

Lyudmil Antonov

unread,

Dec 24, 2018, 12:38:58 PM12/24/18

to LCZero

Do you want both the mg and eg values or only the mg ones?

Message has been deleted

Lyudmil Antonov

unread,

Dec 24, 2018, 2:03:58 PM12/24/18

to LCZero

You have it at https://github.com/lantonov/Optimisation/blob/master/stockfish.exe

Message has been deleted

Lyudmil Antonov

unread,

Dec 25, 2018, 2:04:03 AM12/25/18

to LCZero

Ok, you have a popcnt version in the same place

On Monday, December 24, 2018 at 9:24:13 PM UTC+2, Jupiter wrote:

Oh no this is bmi, could you try for sandybridge or popcnt, or just an ordinary 64bit. Thanks and Merry Christmas!!.

Message has been deleted

LuckyDay

unread,

Dec 25, 2018, 7:26:46 AM12/25/18

to LCZero

definitely keeping a track of this; great work so far. would be nice to have an alternative to clop to optimise leela; would also be interesting to see how their suggested parameters compare.

On Monday, December 24, 2018 at 9:20:24 PM UTC+8, Jupiter wrote:

The CPuct and CPuctFactor tuned is indeed very dependent on TC as the final_cpuct is also a function of nodes and nodes of Lc0 varies a lot, usually higher at long thinking time.

final_cpuct = CPuct + CPuctFactor * natural_log((nodes+CPuctBase)/CPuctBase)

I just keep the CPuctBase constant.

TC 15s+0.1s on Tesla V100 is only around 8K average nodes per move. And optimization is only done at TC3s+0.05s. So basically I am only testing the surface. At deeper depths or more nodes, anything can happen.

I will try this on A/B engines.

Will share the script later, it is not user friendly at the moment, will refactor this and create a setting file for user convenience.

Lyudmil Antonov

unread,

Dec 25, 2018, 11:02:12 AM12/25/18

to LCZero

Meanwhile, similar discussion has arisen in the Fishcooking forum (https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/dzBJZCBzRY4) started by the Professor of Statistics.

Lyudmil Antonov

unread,

Dec 25, 2018, 11:07:17 AM12/25/18

to LCZero

The couple of tests in Fishtesting with imbalance coefficients were a flop though this failed to discourage me. More games per iteration and bigger depth !

Message has been deleted

Lyudmil Antonov

unread,

Dec 26, 2018, 3:25:16 AM12/26/18

to LCZero

Thanks, very instructive.

On Wednesday, December 26, 2018 at 8:42:59 AM UTC+2, Jupiter wrote:

Results with CLOP on Stockfish piece values.

This is the plot of how the win rate progresses after every game for 6000 games (around 6 hours of tuning) at TC 5s+0.1s training matches. Piece values are also shown, slowly converging as training games increases.

CLOP tuning session plot

And this is the image from CLOP interface.

Piece values on the max tab, shown at right are used against the default values, as game verification played at TC 30s+0.1s for 100 games.
KnightValueMg 749 etc.

The verification games I did on Bayesian Optimization (BO) was actually played at TC 30s+0.1s (and not TC 15s+0.1s), so CLOP vs default game verification should also be at TC 30s+0.1s

Here is the result on CLOP vs default game verification, TC 30s+0.1s.

Score of Stockfish 251218 CLOP vs Stockfish 251218 default: 18 - 23 - 59 [0.475]
Elo difference: -17.39 +/- 43.75, LOS: 21.74 %, DrawRatio: 59.0 %

100 of 100 games finished.

So compared with BO, CLOP is also good, but perhaps it needs more games to have a good CLOP optimized values. Note in BO the best optimized values were already found at iteration 45, which means 45 x 50 games = 2250 games only. We need more experiments on the BO though.

Somehow a stopping condition will be implemented in BO like when it already found a score rate of 60% or more, tuning can be stopped to save time. Perhaps the saved time can be used to increase the training matches from 50 to something like 100 games per iteration.

On Wednesday, December 26, 2018 at 3:47:34 AM UTC+8, Jupiter wrote:
Alright here is my first attempt result for SF.

Set iteration to 100, and 50 games per iteration, TC 5s+0.1s is used during BO (Bayesian Optimization) trying to optimize piece values of B, N, Q and R.

parameters.cfg file

[parameters]
# Parameters to be optimized
# Format: <param> = <min>,<max>
KnightValueMg = 482,1564
BishopValueMg = 530,1660
RookValueMg = 889,2578
QueenValueMg = 2029,5058

[cutechess-cli]
path = C:/chess/CuteChess-CLI/cutechess-cli.exe
concurrency = 6
tc = 0/5+0.1
openings = file=2moves.pgn format=pgn order=random
resign = movecount=3 score=400
draw = movenumber=30 movecount=6 score=0
repeat = 2
rounds = 50

[base]
path = C:/chess/BO/stockfish.exe
name = base
Threads = 1
Hash = 128

[candidate]
# Engine to be optimized
path = C:/chess/BO/stockfish.exe
name = candidate
Threads = 1
Hash = 128

The engine defaults:

option name KnightValueMg type spin default 782 min 0 max 1564
option name BishopValueMg type spin default 830 min 0 max 1660
option name RookValueMg type spin default 1289 min 0 max 2578
option name QueenValueMg type spin default 2529 min 0 max 5058

In the parameters to be optimized, I follow the maximum value in the default. But did not follow the minimum value.

Also change the max value to 1 so that the bound is within 0 to 1, see the Normalized param bounds below. But when the value is given to the engine I restored its value format.

Example for Knight, min = 482 and max = 1564, see parameters.cfg file

param_div_knight = max = 1564
param_norm_min_knight = 482/param_div_knight = 482/1564 = 0.30818
param_norm_max_knight = 1564/param_div_knight = 1564/1564 = 1.0

Similar calculation were done for other params.

So when BO suggests a Knight at 0.8065, see iter 1, I restore the value and give the engine the restored value.
value = 0.8065 x param_div_knight or 0.8065 x 1564 = 1261
setoption name KnightValueMg value 1261

Note BO will sort the param in alphabetical order. See the it showed iter / target / Bish / Knig / ...

Starting BO

Max iteration: 100
Normalized param bounds: {'KnightValueMg': (0.30818414322250637, 1.0), 'BishopValueMg': (0.3192771084337349, 1.0), 'RookValueMg': (0.3448409619860357, 1.0), 'QueenValueMg': (0.4011466982997232, 1.0)}
Best of 50 games per iteration

| iter | target | Bishop... | Knight... | QueenV... | RookVa... |
-------------------------------------------------------------------------
try parameters = BishopValueMg 1001 KnightValueMg 1261 QueenValueMg 2029 RookValueMg 1399
playing game matches ...
score: 0.16
| 1 | 0.16 | 0.6032 | 0.8065 | 0.4012 | 0.5429 |
try parameters = BishopValueMg 1660 KnightValueMg 481 QueenValueMg 5058 RookValueMg 2578
playing game matches ...
score: 0.02
| 2 | 0.02 | 1.0 | 0.3082 | 1.0 | 1.0 |
try parameters = BishopValueMg 530 KnightValueMg 1564 QueenValueMg 5058 RookValueMg 889
playing game matches ...
score: 0.0
| 3 | 0.0 | 0.3193 | 1.0 | 1.0 | 0.3448 |
try parameters = BishopValueMg 1660 KnightValueMg 1564 QueenValueMg 2029 RookValueMg 2578
playing game matches ...
score: 0.01
| 4 | 0.01 | 1.0 | 1.0 | 0.4011 | 1.0 |

...
...
try parameters = BishopValueMg 1322 KnightValueMg 1564 QueenValueMg 2029 RookValueMg 1507
playing game matches ...
score: 0.06
| 99 | 0.06 | 0.7967 | 1.0 | 0.4011 | 0.5846 |
try parameters = BishopValueMg 1660 KnightValueMg 481 QueenValueMg 3726 RookValueMg 2299
playing game matches ...
score: 0.0
| 100 | 0.0 | 1.0 | 0.3082 | 0.7368 | 0.8919 |
try parameters = BishopValueMg 530 KnightValueMg 481 QueenValueMg 4122 RookValueMg 1287
playing game matches ...
score: 0.21
| 101 | 0.21 | 0.3193 | 0.3082 | 0.8151 | 0.4995 |
=========================================================================

Best Parameters:
target: 0.61
BishopValueMg: 820
KnightValueMg: 734
QueenValueMg: 2726
RookValueMg: 1173
Elapsed: 275.3 minutes

So this was done after 275.3 minutes or 4.6 hours. It found the best param with a score rate or target equals 0.61 or 61% at iteration 45.

try parameters = BishopValueMg 820 KnightValueMg 734 QueenValueMg 2726 RookValueMg 1173
playing game matches ...
score: 0.61
| 45 | 0.61 | 0.4943 | 0.4694 | 0.5391 | 0.4551 |

In comparison the defaults are:

option name BishopValueMg type spin default 830 min 0 max 1660
option name KnightValueMg type spin default 782 min 0 max 1564
option name QueenValueMg type spin default 2529 min 0 max 5058
option name RookValueMg type spin default 1289 min 0 max 2578

Verify BO result thru actual game matches.

Run a game match at TC 15s+0.1s for 100 games using 50 positions from start_opening.pgn with side reversed.
The engine Stockfish 251218 BO used the best parameters found by BO, while Stockfish 251218 used it default values.

Result:

Score of Stockfish 251218 BO vs Stockfish 251218 default: 21 - 27 - 52 [0.470]

Elo difference: -20.87 +/- 47.40, LOS: 19.32 %, DrawRatio: 52.0 %

100 of 100 games finished.

So the BO loses by 6 points or Elo difference: -20.87 +/- 47.40, LOS: 19.32 %, DrawRatio: 52.0 %

Total games played in BO at TC 5s+0.1 is 101 x 50 = 5050 games.

Although the BO loses (still within margin of error), I think generally it still able to optimize given the range of values it has to explore.

[parameters]
# Parameters to be optimized
# Format: <param> = <min>,<max>
KnightValueMg = 482,1564
BishopValueMg = 530,1660
RookValueMg = 889,2578
QueenValueMg = 2029,5058

range_n = 1564 - 482 = 1082
range_b = 1660 - 530 = 1130
range_r = 2578 - 889 = 1689
range_q = 5058 - 2029 = 3029

param_values_possibilities = 1082 x 1130 x 1689 x 3029 = 6,255,105,329,460 or around 6 Trillion. It only takes 5050 games in around 6 hours of tuning and still able to play decently against the higly optimized SF default values.

The BO was done at TC 5s+0.1s and the verification game match was done at TC 15s+0.1s, so there is an issue of scaling, and we knew that SF default is optimized to look good at different TC's.

I will run this in CLOP and compare its results with the BO. Still same conditions, TC 5s+0.1s, single opponent SF default and also around 6 hours of optimization session.

Lyudmil Antonov

unread,

Dec 26, 2018, 3:53:34 AM12/26/18

to LCZero

Somehow a stopping condition will be implemented in BO like when it already found a score rate of 60% or more, tuning can be stopped to save time. Perhaps the saved time can be used to increase the training matches from 50 to something like 100 games per iteration.

In most of my experiments BO finds the max very early and the rest is just jumping up and down following the noise, so a stopping condition is badly needed.

Message has been deleted

Lyudmil Antonov

unread,

Dec 28, 2018, 2:38:59 AM12/28/18

to LCZero

On Friday, December 28, 2018 at 5:35:04 AM UTC+2, Jupiter wrote:

Run another BO session on Sf piece values. But this time, I use TC 30s+0.05s as tuning TC at 50 games per iteration. It is still in progress, I use a setting so that BO will prefer exploration (Exploration factor is 1.0) and piece value bounds start at default +/- 300. So it at least get a hint of where the optimal param space are located, but this I also use more exploration instead of exploitation so that it still visits those extreme (min/max) values, see BO.cfg down below. I am close to releasing this optimizer.

After 9 iterations it could not still find a setting that could perform 50% or more. At around 9 minutes per iteration that would be 1.35 hours already.

Initial point: 1
Max iteration: 100
Games per iteration: 50
Acquisition function: ei
Exploration factor: 1.00

Is the exploration factor xi or alpha?

Reply all

Reply to author

Forward

Message has been deleted