This is a draft of the paper I will submit to ACG13.
Title: CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning
Abstract: Artificial intelligence in games often leads to the problem of parameter tuning. Some heuristics may have coefficients, and they should be tuned to maximize the win rate of the program. A possible approach consists in building local quadratic models of the win rate as a function of program parameters. Many local regression algorithms have already been proposed for this task, but they are usually not robust enough to deal automatically and efficiently with very noisy outputs and non-negative Hessians. The CLOP principle, which stands
for Confident Local OPtimization, is a new approach to local regression that overcomes all these problems in a simple and efficient way. It consists in discarding samples whose estimated value is confidently inferior to the mean of all samples. Experiments demonstrate that, when the function to be optimized is smooth, this method outperforms all other tested algorithms.
pdf and source code:
http://remi.coulom.free.fr/CLOP/
Comments, questions, and suggestions for improvement are welcome.
Rémi
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
This uses Gomill as the 'twogtp' back-end, and combines the CLOP
settings and the engine configuration into a single configuration file
(rather than putting the latter in the connection script).
If anyone else wants to use it, it's included as an example script in
Gomill 0.7.2, which can be downloaded from
http://mjw.woodcraft.me.uk/gomill/ .
-M-
When the weights are recalculated in Algorithm 1, the expression for wk is
exp((qk(x) - mk) / H * sk).
Should the formula have a square? That is, exp((qk(x) - mk) * (qk(x) - mk) /
H * sk)?
Thanks,
Brian
> I am going through the paper, and there is a point where I do not
> understand.
>
> When the weights are recalculated in Algorithm 1, the expression for wk is
> exp((qk(x) - mk) / H * sk).
>
> Should the formula have a square? That is, exp((qk(x) - mk) * (qk(x) - mk) /
> H * sk)?
>
> Thanks,
> Brian
No. The idea is that the weight of a sample should be low when it is far below the mean, not when it is far from the mean. That is to say, samples whose value is very low according to the regression get a low weight. But samples whose strength is estimated to be above average keep a full weight of 1 (because of the "min", the weight can never get above 1).
Note BTW that since my previous message I updated the web site of CLOP with some data, screenshots, and a link to the computer-chess forum with more discussions about the algorithm:
http://remi.coulom.free.fr/CLOP/
-----Original Message-----
From: computer-...@dvandva.org
[mailto:computer-...@dvandva.org] On Behalf Of Rémi Coulom
Sent: Saturday, September 10, 2011 11:36 AM
To: compu...@dvandva.org
Subject: Re: [Computer-go] CLOP: Confident Local Optimization
forNoisyBlack-Box Parameter Tuning
Normally you need a lot of data to make a decent regression function. For
example, if you have N arguments in your function, then CLOP
(Correlated-All) needs 1 + N * (N+3) / 2 parameters. So if you want 10
observations per parameter, then you need 10 + 5N(N+3) samples.
But even getting *one* sample can be tricky, because the 'logit' for a
sample is +INF if the sample wins all of its games, and -INF if the sample
loses all of its games. So you need a sample that has some wins and some
losses. If the true value of the function is near 0.5, then the average
number of trials required to obtain a sample is around 3, which is fine.
But some of the test functions in your paper are very different. For
example, the Correlated2 function is nearly 0 for most of the domain
[-1,1]^4. When I sample randomly, it takes ~5K samples (that is, ~20K
trials) to turn up enough samples to fit a regression line.
I tried initializing my win/loss counters to epsilon instead of zero. But
that technique was not robust, because any reasonable epsilon is actually
larger than Correlated2 for most of the domain. Consequently, the "reduce
the weights" step does not reduce enough weights, and the logistic
regression ends up fitting epsilon, rather than Correlated2.
So I cannot get a valid measurement with less than 20K trials before the
first regression step. But your paper shows regret curves that start out at
10 trials.
What am I missing?
Thanks,
Brian
On 4 oct. 2011, at 18:54, Brian Sheppard wrote:
> Hi, Remi. I have a question about the "burn-in" process for CLOP.
>
> Normally you need a lot of data to make a decent regression function. For
> example, if you have N arguments in your function, then CLOP
> (Correlated-All) needs 1 + N * (N+3) / 2 parameters. So if you want 10
> observations per parameter, then you need 10 + 5N(N+3) samples.
>
> But even getting *one* sample can be tricky, because the 'logit' for a
> sample is +INF if the sample wins all of its games, and -INF if the sample
> loses all of its games. So you need a sample that has some wins and some
> losses. If the true value of the function is near 0.5, then the average
> number of trials required to obtain a sample is around 3, which is fine.
I deal with +INF/-INF with a prior: the Gaussian prior regularizes the regression, so its tends to remain flat and close to 0.5 when very few samples have been collected.
>
> But some of the test functions in your paper are very different. For
> example, the Correlated2 function is nearly 0 for most of the domain
> [-1,1]^4. When I sample randomly, it takes ~5K samples (that is, ~20K
> trials) to turn up enough samples to fit a regression line.
I am not sure I understand what you mean. If you use regularization, you can perform regression even with zero samples. Of course, it is very inaccurate. But if you are careful to take confidence intervals into consideration, you can still do statistics with very few samples, and determine with some significance that an area is bad.
>
> I tried initializing my win/loss counters to epsilon instead of zero. But
> that technique was not robust, because any reasonable epsilon is actually
> larger than Correlated2 for most of the domain. Consequently, the "reduce
> the weights" step does not reduce enough weights, and the logistic
> regression ends up fitting epsilon, rather than Correlated2.
>
> So I cannot get a valid measurement with less than 20K trials before the
> first regression step. But your paper shows regret curves that start out at
> 10 trials.
>
> What am I missing?
I am not sure what you are missing.
In the case of Correlated2: In the beginning CLOP will sample uniformly at random (if you run the algorithm in the paper with N=0, then w(x)=1 everywhere). As soon at it find its first win, it will start focusing around that first win. You should be able to easily run CLOP on Correlated2. Just edit DummyExperiment.clop and DummyScript.py. You can also take a look at Gian-Carlo's chess data: it is a bit similar, as most games are lost in the beginning.
One important aspect of CLOP is the use of the confidence interval. It does not matter if the regression is very inaccurate. Even with an inaccurate regression, it can get confident that some areas of the search space are below average, so they should not be sampled.
If you sample uniformly at random until you get an accurate regression, then, yes, it will take forever. Maybe what you are missing is that CLOP does not need an accurate regression at all to already focus its sampling on a promising region.
Rémi
It is especially important that having the prior will focus attention on the
region of success. In the case of Correlated2, where only a tiny fraction of
the space is non-zero, that will massively reduce the burn-in period.
-----Original Message-----
From: computer-...@dvandva.org
[mailto:computer-...@dvandva.org] On Behalf Of Rémi Coulom
Sent: Tuesday, October 04, 2011 3:18 PM
To: compu...@dvandva.org
Subject: Re: [Computer-go] CLOP: Confident Local Optimization
forNoisyBlack-Box Parameter Tuning
Consider the root node; at the beginning of the search it's desirable to sample all the children equally, to be sure each has a fair chance to be noted as winning or losing. However, as the simulations continue, if this egalitarian distribution continues, the simulations from losing nodes dilutes the results (as well as wasting time), so it's necessary to start concentrating on the winning nodes. The exact method of transitioning from broad to narrow focus can have dramatic effect on the results.
Doesn't the UCB formula basically encode this behavior? What I think I
learned about UCT from experimenting with dimwit is that, for nodes
other than the root, you need to reduce exploration so scores are not
too polluted by bad moves, but then the principal variation gets
ridiculously deep, which means that more exploration is needed. At the
root you can make the search explore more, since you don't need to
back out a score.
I don't know if go has an equivalent to queen sacrifices in chess, but
it would be very hard to make a UCT program that plays something like
that correctly: The queen sacrifice would look like a horrible move,
with really low score, and if you make the search explore enough to
figure out that it's a good move (by finding several correct
continuation moves) in a practical amount of time, the score will be
horribly polluted in the mean time.
The solution has to be disassociating how much time you spend
exploring a move and how much it contributes to the score of its
parent. I feel that UCT is great for making up a score out of repeated
simulations, but eventually we should end up thinking of it as an
evaluation function and using something much closer to minimax for the
parts of the tree close to the root. Unfortunately, I don't have any
successful experiments to back out this feeling.
Yes, but the formula is not cast in stone. There are
infinite variations that implement the basic concept.
I guess the lesson I wanted to convey is that this formula,
or perhaps an algorithm too complicated to be expressed as
a simple formula, is part of the space that needs to be
explored.
Yes, but the formula is not cast in stone. There are
infinite variations that implement the basic concept.
I guess the lesson I wanted to convey is that this formula,
or perhaps an algorithm too complicated to be expressed as
a simple formula, is part of the space that needs to be
explored.
_______________________________________________
Right.
Making MCTS (or UCT) a success in practice consists
of 3 % principle-understanding and 97 % fine-tuning.
There are myriads of ways to implement Monte Carlo
in a favorable way.
Ingo.
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
What you wrote sounds like you re-discovered the importance of progressive widening PW ;-)
(See: Coulom, Computing Elo Ratings of Move Patterns in the Game of Go, 4.2 Progressive
Widening of the Monte-Carlo Search Tree)
In 19x19 when I implemented (about 1 year ago) RAVE a progressive widening I had the first
wins against gnugo in 19x19 (I had some 9x9 wins before). But the reason PW is so good is
somewhat different when you combine it with RAVE:
When the node has few visits, you only explore (say) 3 moves and those moves are the 3 best
moves according to some a-priori heuristic, but when you widen the tree, you do NOT include
the 4th move according to the same criterion, but the best non-explored RAVE candidate. Only
3 nodes are considered for UCT (in the beginning, of course) but ALL nodes get RAVE updates.
And these RAVE updates are specific for the path in the tree leading to the node. So all non
explored nodes get high quality RAVE information and when you widen the tree the 4th
candidate is a good move for whole board position represented by the node.
The way I implemented PW for the first time is the formula by Hiroshi Yamashita (below)
Jacques.
(I copy/paste from my notes. It is somewhere in the list.)
Aya:
----
(1 - beta) * (win_rate + 0.31 * sqrt( ln(parent_visits) / child_visits)) + beta (rave_win_rate * 0.31 * sqrt(
ln(rave_parent_visits) / rave_child_visits))
beta = sqrt(100 / (3 * child_visits + 100));
Aya uses Progressive Widening. High order N moves are only considerd.
PW_sort_N = ln(parent_visits/ 40.0) / ln(1.4) +2;
Moves are sorted sometimes by rave value, Criticality, and MC owners.
I also would like to know how to count rave.
UCT searches B(E5),W(D3),B(C5),W(F7), and in this position, playout searches
B(E7),W(E8),B(D8),W(F8),B(D7).. Black win.
In W(D3) positions, Aya updates RAVE and UCT,
Updates C5(UCT)
Updates C5(RAVE)
Updates E7(RAVE)
Updates D8(RAVE)
Updates D7(RAVE)
I think "Updates C5(RAVE)" is strange, but I could not get good result without this.
Hiroshi Yamashita
Hi,I'm considering CLOP to be one of the compared optimizer in RobustOptimizer https://github.com/ChinChangYang/RobustOptimizer/issues/68. However, I have some questions to your experiment.The CLOP is for noisy black-box parameter tuning. However, your test functions (LOG, FLAT, POWER, ANGLE, and STEP) are noise-free functions as shown in Table 1. It is very difficult to prove that CLOP can work very well on noisy functions.I suggest that the problem definition f(x) = 1/(1+exp(-r(x))) should be perturbed with some random variables with a defined zero-mean distribution, such as Gaussian distribution, uniform distribution, or any others. Specifically, the problem definitions can be g(x) = 1/(1+exp(-r(x) + n(x))) where n(x) is an additional noise. The performance of the algorithms can be evaluated in terms of solution error measure, which is defined as f(x) - g(x*) where x* is the global optimum of the noise-free function f.
BBOB 2012 defines some noisy functions http://coco.gforge.inria.fr/doku.php?id=bbob-2012 which may also provide confident performance evaluation for noisy optimization.There may exist more appropriate performance evaluation methods than aforementioned ones for win/loss outcomes. Anyway, in this paper, the experiment uses noise-free functions as test functions. It cannot prove anything for noisy optimization.Best regards,Chin-Chang Yang, 2013/03/06--2011/9/1 Rémi Coulom <Remi....@free.fr>
Hi,
This is a draft of the paper I will submit to ACG13.
Title: CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning
Abstract: Artificial intelligence in games often leads to the problem of parameter tuning. Some heuristics may have coefficients, and they should be tuned to maximize the win rate of the program. A possible approach consists in building local quadratic models of the win rate as a function of program parameters. Many local regression algorithms have already been proposed for this task, but they are usually not robust enough to deal automatically and efficiently with very noisy outputs and non-negative Hessians. The CLOP principle, which stands
for Confident Local OPtimization, is a new approach to local regression that overcomes all these problems in a simple and efficient way. It consists in discarding samples whose estimated value is confidently inferior to the mean of all samples. Experiments demonstrate that, when the function to be optimized is smooth, this method outperforms all other tested algorithms.
pdf and source code:
http://remi.coulom.free.fr/CLOP/
Comments, questions, and suggestions for improvement are welcome.
Rémi
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Chin-Chang Yang
The CLOP is for noisy black-box parameter tuning. However, your test functions (LOG, FLAT, POWER, ANGLE, and STEP) are noise-free functions as shown in Table 1. It is very difficult to prove that CLOP can work very well on noisy functions.
The CLOP is for noisy black-box parameter tuning. However, your test functions (LOG, FLAT, POWER, ANGLE, and STEP) are noise-free functions as shown in Table 1. It is very difficult to prove that CLOP can work very well on noisy functions.Waow :-) that would be a very strange noisy optimization paper if it was about testing on noise-free functions.The functions are certainly not noise-free; what you read (and which is noise-free...) is their _expected_ values.
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Since the functions are not noise-free, they should be defined in terms of some noise. I really need the definition of the noise for comparison between CLOP and other optimizers.
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Rémi
> Chin-Chang Yang _______________________________________________
I implemented the test functions in my project, and tested the performance of my optimizers, which reached the regret to 1e-3 with 1e7 samples in the 1-dimensional LOG function. In the CLOP paper, it reports that CLOP can reach the regret to 1e-5 with 1e7 samples. It is very interesting that how CLOP can perform so well. I am going to trace the CLOP codes by gdb, but I cannot find the script that can reproduce the experimental results in Fig.4 (a). I guess the script is located in programs/clop/script/artificial but I am not sure which script can reproduce the experiments.
I will be glad if anyone can give me some hints.
Best regards,
Chin-Chang Yang, 2013/3/21
-----Original Message-----
From: computer-...@dvandva.org [mailto:computer-...@dvandva.org] On Behalf Of Remi Coulom
Sent: Wednesday, March 6, 2013 5:37 PM
To: compu...@dvandva.org
If you wish to explore the behaviour of CLOP on a particular problem, I recommend the "test_swig.py" script in the programs/clop/script/artificial directory.
You have to edit this script to select the experiment you wish to run.
First, select your problem. You can do this by uncommenting one (only) of the "p = " lines at the beginning of the script. For the Log function, uncomment "p = CPLog1D()" and comment out the other ones.
Then, you can select the optimization method. The default is normal quadratic CLOP. You can also use cubic CLOP, CEM, or BAST if you wish.
Finally, in the "main program" section you can select what you wish to do with this problem and method:
- eb.threads will run many replications of the optimization experiment in parallel, and will produce some statistics
- eb.gnuplot will run the optimization once and plot the samples. You can select the seed.
- eb.multi_gnuplot will plot the optimization for many seeds in succession
- eb.tikz: probably produces something similar to eb.gnuplot in tikz format for incorporation into LaTeX documents
Rémi