Nice info thank you.In deepmind's algorithm, it is like hand-tuning, modify the param value, play matches vs the default, the result of the match is then feed into the optimizer and the optimizer will suggest the next param value to try.Creating matches (takes time ***) and calculating results is easy. Now it is time to see the input and output of that optimizer. There are examples in that site you quote. It is interesting indeed to look deeper into it.*** Deep mind uses 50 games? and send the result to optimizer to get the next suggested param value to try next.
Get it now based from sample jupyter notebook on that site.Basically just revise the black_box_function() to return the match result. Relevant changes below.def black_box_function(CPuct, CPuctBase, CPuctFactor):"""Function with unknown internals we wish to maximize.This is just serving as an example, for all intents andpurposes think of the internals of this function, i.e.: the processwhich generates its output values, as unknown."""# Comment out the function from the sample# return -x ** 2 - (y - 1) ** 2 + 1# Introduce a new system to return values. Return the perf ((wins+draws/2)/(wins+draws+loses))# from result in playing the test engine vs the default enginenum_games = 50# Run game matches between the test engine with new param vs default engine with default paramperf = match_engine(test_engine, CPuct, CPuctBase, CPuctFactor, default_engine, max_games = num_games)# Bayesian optimizer will maximize the perfreturn perfCreate a new function that will match the enginedef match_engine(test_engine, CPuct, CPuctBase, CPuctFactor, default_engine, max_games = num_games) :"""Return the perf from the point of view of test engineCutechess can be called to create the matches and parse the results"""# Score of test_engine vs default engine: 12 - 1 - 19 [0.672] 32# Example just return the score rate of test_engineperf = 0.672return perfThen also revise the bounds.# Bounded region of parameter spacepbounds = {'CPuct': (2, 5), 'CPuctBase': (1000, 100000), 'CPuctFactor': (1, 3)}
On Saturday, December 22, 2018 at 3:57:42 PM UTC+8, Jupiter wrote:
Nice info thank you.In deepmind's algorithm, it is like hand-tuning, modify the param value, play matches vs the default, the result of the match is then feed into the optimizer and the optimizer will suggest the next param value to try.Creating matches (takes time ***) and calculating results is easy. Now it is time to see the input and output of that optimizer. There are examples in that site you quote. It is interesting indeed to look deeper into it.*** Deep mind uses 50 games? and send the result to optimizer to get the next suggested param value to try next.
On Saturday, December 22, 2018 at 4:58:25 AM UTC+8, Lyudmil Antonov wrote:
Finally tested this optimization at google cloud using Tesla V100 gpu.I used:optimizer.maximize(init_points = 1,n_iter = opti_iter,acq = "ei",xi = 0.05)acq = "ei", is Acquisition Function "Expected Improvement"Max iteration: 20Param bounds: {'CPuctFactor': (0.5, 3.0), 'CPuct': (2.5, 4.5)}Best of 32 games per iteration, net id 32182. (Deepmind used 50 games)Games are run at TC 3s+0.05sAfter around 1 hr of tuning, I get this.| iter | target | CPuct | CPuctF... |-------------------------------------------------......| 14 | 0.438 | 4.5 | 1.326 |try parameters = CPuct 2.5 CPuctFactor 2.1631013357249707processing ...score: 0.422| 15 | 0.422 | 2.5 | 2.163 |try parameters = CPuct 3.0930171934216246 CPuctFactor 3.0processing ...score: 0.594| 16 | 0.594 | 3.093 | 3.0 |try parameters = CPuct 3.106287110348023 CPuctFactor 1.7459954705321463processing ...score: 0.5| 17 | 0.5 | 3.106 | 1.746 |try parameters = CPuct 4.498966652249406 CPuctFactor 2.276217104711355processing ...score: 0.453| 18 | 0.453 | 4.499 | 2.276 |try parameters = CPuct 4.0449655865241585 CPuctFactor 1.7403143498944391processing ...score: 0.5| 19 | 0.5 | 4.045 | 1.74 |try parameters = CPuct 3.007779360262849 CPuctFactor 2.717663047532658processing ...score: 0.391| 20 | 0.391 | 3.008 | 2.718 |try parameters = CPuct 3.5571016382646956 CPuctFactor 1.8782503551645444processing ...score: 0.516| 21 | 0.516 | 3.557 | 1.878 |=================================================Done!!best param: {'params': {'CPuctFactor': 3.0, 'CPuct': 3.0930171934216246}, 'target': 0.594}The target 0.594 is the best score rate found after 21 iterations. The iteration is 21 because the first iteration was done randomly so that the tuner will have an idea of the next param values to try.The Param bounds: {'CPuctFactor': (0.5, 3.0), 'CPuct': (2.5, 4.5)} would ensure that the optimizer will only try around those values.Run a real 50 game match at TC 15s+0.1s between the default vs the Tuned valuesDefault: CPuct = 3.0, CPuctFactor = 2.0Tuned: CPuct = 3.09, CPuctFactor = 3.0Net id: 32182ResultScore of Lc0 v0.19.1 32182 cpuct_def vs Lc0 v0.19.1 32182 cpuct_3.09_19652_3.0: 5 - 7 - 38 [0.480] 50Elo difference: -13.90 +/- 47.32Finished matchTuned values leads by 2 points.It is important to note that the tuning was done at TC 3s+0.05s, and the above match is based on TC 15s+0.1s. Perhaps if the tuning was done close to TC 15s+0.1s and with high number of iterations, and with higher tuning matches per iteration, it might find a much better param values.I used the script to run engine matches from this repo, https://github.com/snicolet/spsa, which is also based on the cutechess-cli.
On Friday, December 21, 2018 at 8:15:27 PM UTC+8, Lyudmil Antonov wrote:
The CPuct and CPuctFactor tuned is indeed very dependent on TC as the final_cpuct is also a function of nodes and nodes of Lc0 varies a lot, usually higher at long thinking time.final_cpuct = CPuct + CPuctFactor * natural_log((nodes+CPuctBase)/CPuctBase)I just keep the CPuctBase constant.TC 15s+0.1s on Tesla V100 is only around 8K average nodes per move. And optimization is only done at TC3s+0.05s. So basically I am only testing the surface. At deeper depths or more nodes, anything can happen.I will try this on A/B engines.Will share the script later, it is not user friendly at the moment, will refactor this and create a setting file for user convenience.
More games per iter is indeed better. The score rate is more accurate and would make the optimizer looks good converging the values faster.BTW do you have an SF exe compile where the piece value of N, B, R and Q are exposed for tuning? Not necessarily the latest Sf code. I plan to tune it with my script to see if the script would work.
Oh no this is bmi, could you try for sandybridge or popcnt, or just an ordinary 64bit. Thanks and Merry Christmas!!.
The CPuct and CPuctFactor tuned is indeed very dependent on TC as the final_cpuct is also a function of nodes and nodes of Lc0 varies a lot, usually higher at long thinking time.final_cpuct = CPuct + CPuctFactor * natural_log((nodes+CPuctBase)/CPuctBase)I just keep the CPuctBase constant.TC 15s+0.1s on Tesla V100 is only around 8K average nodes per move. And optimization is only done at TC3s+0.05s. So basically I am only testing the surface. At deeper depths or more nodes, anything can happen.I will try this on A/B engines.Will share the script later, it is not user friendly at the moment, will refactor this and create a setting file for user convenience.
Results with CLOP on Stockfish piece values.This is the plot of how the win rate progresses after every game for 6000 games (around 6 hours of tuning) at TC 5s+0.1s training matches. Piece values are also shown, slowly converging as training games increases.And this is the image from CLOP interface.
Piece values on the max tab, shown at right are used against the default values, as game verification played at TC 30s+0.1s for 100 games.
KnightValueMg 749 etc.
The verification games I did on Bayesian Optimization (BO) was actually played at TC 30s+0.1s (and not TC 15s+0.1s), so CLOP vs default game verification should also be at TC 30s+0.1s
Here is the result on CLOP vs default game verification, TC 30s+0.1s.
Score of Stockfish 251218 CLOP vs Stockfish 251218 default: 18 - 23 - 59 [0.475]
Elo difference: -17.39 +/- 43.75, LOS: 21.74 %, DrawRatio: 59.0 %
100 of 100 games finished.
So compared with BO, CLOP is also good, but perhaps it needs more games to have a good CLOP optimized values. Note in BO the best optimized values were already found at iteration 45, which means 45 x 50 games = 2250 games only. We need more experiments on the BO though.
Somehow a stopping condition will be implemented in BO like when it already found a score rate of 60% or more, tuning can be stopped to save time. Perhaps the saved time can be used to increase the training matches from 50 to something like 100 games per iteration.
On Wednesday, December 26, 2018 at 3:47:34 AM UTC+8, Jupiter wrote:Alright here is my first attempt result for SF.Set iteration to 100, and 50 games per iteration, TC 5s+0.1s is used during BO (Bayesian Optimization) trying to optimize piece values of B, N, Q and R.parameters.cfg file[parameters]# Parameters to be optimized# Format: <param> = <min>,<max>KnightValueMg = 482,1564BishopValueMg = 530,1660RookValueMg = 889,2578QueenValueMg = 2029,5058[cutechess-cli]path = C:/chess/CuteChess-CLI/cutechess-cli.execoncurrency = 6tc = 0/5+0.1openings = file=2moves.pgn format=pgn order=randomresign = movecount=3 score=400draw = movenumber=30 movecount=6 score=0repeat = 2rounds = 50[base]path = C:/chess/BO/stockfish.exename = baseThreads = 1Hash = 128[candidate]# Engine to be optimizedpath = C:/chess/BO/stockfish.exename = candidateThreads = 1Hash = 128The engine defaults:option name KnightValueMg type spin default 782 min 0 max 1564option name BishopValueMg type spin default 830 min 0 max 1660option name RookValueMg type spin default 1289 min 0 max 2578option name QueenValueMg type spin default 2529 min 0 max 5058In the parameters to be optimized, I follow the maximum value in the default. But did not follow the minimum value.Also change the max value to 1 so that the bound is within 0 to 1, see the Normalized param bounds below. But when the value is given to the engine I restored its value format.Example for Knight, min = 482 and max = 1564, see parameters.cfg fileparam_div_knight = max = 1564param_norm_min_knight = 482/param_div_knight = 482/1564 = 0.30818param_norm_max_knight = 1564/param_div_knight = 1564/1564 = 1.0Similar calculation were done for other params.So when BO suggests a Knight at 0.8065, see iter 1, I restore the value and give the engine the restored value.value = 0.8065 x param_div_knight or 0.8065 x 1564 = 1261setoption name KnightValueMg value 1261Note BO will sort the param in alphabetical order. See the it showed iter / target / Bish / Knig / ...Starting BOMax iteration: 100Normalized param bounds: {'KnightValueMg': (0.30818414322250637, 1.0), 'BishopValueMg': (0.3192771084337349, 1.0), 'RookValueMg': (0.3448409619860357, 1.0), 'QueenValueMg': (0.4011466982997232, 1.0)}Best of 50 games per iteration| iter | target | Bishop... | Knight... | QueenV... | RookVa... |-------------------------------------------------------------------------try parameters = BishopValueMg 1001 KnightValueMg 1261 QueenValueMg 2029 RookValueMg 1399playing game matches ...score: 0.16| 1 | 0.16 | 0.6032 | 0.8065 | 0.4012 | 0.5429 |try parameters = BishopValueMg 1660 KnightValueMg 481 QueenValueMg 5058 RookValueMg 2578playing game matches ...score: 0.02| 2 | 0.02 | 1.0 | 0.3082 | 1.0 | 1.0 |try parameters = BishopValueMg 530 KnightValueMg 1564 QueenValueMg 5058 RookValueMg 889playing game matches ...score: 0.0| 3 | 0.0 | 0.3193 | 1.0 | 1.0 | 0.3448 |try parameters = BishopValueMg 1660 KnightValueMg 1564 QueenValueMg 2029 RookValueMg 2578playing game matches ...score: 0.01| 4 | 0.01 | 1.0 | 1.0 | 0.4011 | 1.0 |......try parameters = BishopValueMg 1322 KnightValueMg 1564 QueenValueMg 2029 RookValueMg 1507playing game matches ...score: 0.06| 99 | 0.06 | 0.7967 | 1.0 | 0.4011 | 0.5846 |try parameters = BishopValueMg 1660 KnightValueMg 481 QueenValueMg 3726 RookValueMg 2299playing game matches ...score: 0.0| 100 | 0.0 | 1.0 | 0.3082 | 0.7368 | 0.8919 |try parameters = BishopValueMg 530 KnightValueMg 481 QueenValueMg 4122 RookValueMg 1287playing game matches ...score: 0.21| 101 | 0.21 | 0.3193 | 0.3082 | 0.8151 | 0.4995 |=========================================================================Best Parameters:target: 0.61BishopValueMg: 820KnightValueMg: 734QueenValueMg: 2726RookValueMg: 1173Elapsed: 275.3 minutesSo this was done after 275.3 minutes or 4.6 hours. It found the best param with a score rate or target equals 0.61 or 61% at iteration 45.try parameters = BishopValueMg 820 KnightValueMg 734 QueenValueMg 2726 RookValueMg 1173playing game matches ...score: 0.61| 45 | 0.61 | 0.4943 | 0.4694 | 0.5391 | 0.4551 |In comparison the defaults are:option name BishopValueMg type spin default 830 min 0 max 1660option name KnightValueMg type spin default 782 min 0 max 1564option name QueenValueMg type spin default 2529 min 0 max 5058option name RookValueMg type spin default 1289 min 0 max 2578Verify BO result thru actual game matches.Run a game match at TC 15s+0.1s for 100 games using 50 positions from start_opening.pgn with side reversed.The engine Stockfish 251218 BO used the best parameters found by BO, while Stockfish 251218 used it default values.Result:Score of Stockfish 251218 BO vs Stockfish 251218 default: 21 - 27 - 52 [0.470]
Elo difference: -20.87 +/- 47.40, LOS: 19.32 %, DrawRatio: 52.0 %
100 of 100 games finished.
So the BO loses by 6 points or Elo difference: -20.87 +/- 47.40, LOS: 19.32 %, DrawRatio: 52.0 %
Total games played in BO at TC 5s+0.1 is 101 x 50 = 5050 games.
Although the BO loses (still within margin of error), I think generally it still able to optimize given the range of values it has to explore.
[parameters]# Parameters to be optimized# Format: <param> = <min>,<max>KnightValueMg = 482,1564BishopValueMg = 530,1660RookValueMg = 889,2578QueenValueMg = 2029,5058range_n = 1564 - 482 = 1082range_b = 1660 - 530 = 1130range_r = 2578 - 889 = 1689range_q = 5058 - 2029 = 3029param_values_possibilities = 1082 x 1130 x 1689 x 3029 = 6,255,105,329,460 or around 6 Trillion. It only takes 5050 games in around 6 hours of tuning and still able to play decently against the higly optimized SF default values.The BO was done at TC 5s+0.1s and the verification game match was done at TC 15s+0.1s, so there is an issue of scaling, and we knew that SF default is optimized to look good at different TC's.I will run this in CLOP and compare its results with the BO. Still same conditions, TC 5s+0.1s, single opponent SF default and also around 6 hours of optimization session.
Somehow a stopping condition will be implemented in BO like when it already found a score rate of 60% or more, tuning can be stopped to save time. Perhaps the saved time can be used to increase the training matches from 50 to something like 100 games per iteration.
Run another BO session on Sf piece values. But this time, I use TC 30s+0.05s as tuning TC at 50 games per iteration. It is still in progress, I use a setting so that BO will prefer exploration (Exploration factor is 1.0) and piece value bounds start at default +/- 300. So it at least get a hint of where the optimal param space are located, but this I also use more exploration instead of exploitation so that it still visits those extreme (min/max) values, see BO.cfg down below. I am close to releasing this optimizer.After 9 iterations it could not still find a setting that could perform 50% or more. At around 9 minutes per iteration that would be 1.35 hours already.Initial point: 1Max iteration: 100Games per iteration: 50Acquisition function: eiExploration factor: 1.00