| A branch to test the Monte Carlo algorithm in Stockfish | Stephane Nicolet | 12/6/17 1:21 PM | Hi all, It so happens that I have been working on a Monte Carlo implementation for Stockfish during the last weeks (before the AlphaZero paper :-), with an hybrid of MCTS (Monte Carlo search tree) and alpha-beta. It is not finished yet, but I thought that I could publish the state of my branch now, just to see if there is interest in the community to get it moving :-) The two main files introduced are montecarlo.h and montecarlo.cpp: My aim is to see if this hybrid algorithm can come close enough from our current pure alpha-beta search in single thread mode, and provide an easy framework which would scale well with multiple threads. Enjoy, Stéphane |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | michel.va...@uhasselt.be | 12/6/17 10:29 PM | On Wednesday, December 6, 2017 at 10:21:12 PM UTC+1, Stephane Nicolet wrote:Wow! AFAICT a UCT implementation in SF! It is "common wisdom" (so maybe not based on evidence) that MCTS does not work in chess since it is "not good in tactics". Let's see.... But when you say your implementation is a blend of MCTS and A/B what do you mean exactly? |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | michel.va...@uhasselt.be | 12/6/17 10:33 PM | On Thursday, December 7, 2017 at 7:29:45 AM UTC+1, michel.va...@uhasselt.be wrote:Ah, I see, it's near the end. You allow for an optional small A/B search in the leaf nodes of the MC tree. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Stephane Nicolet | 12/7/17 12:17 AM |
On Thursday, December 7, 2017 at 7:29:45 AM UTC+1, michel.va...@uhasselt.be wrote:> Wow! AFAICT a UCT implementation in SF! It is "common wisdom" (so maybe not based on evidence) that MCTS does not work in chess since it is "not good in tactics". Let's see.... Yes, in fact AlphaZero search does no longer use any randomization at all, despite the name of the MCTS search. Instead, they use a (big) neural network to evaluate the leafs statically and that neural network outputs at the same time the so-called "priors", which are the probabilities of each move in the position to be winning. The priors are used to guide the MCTS exploration, serving as bayesian estimations of the future rewards of the nodes. In my implementation, to replace the neural network I use small local alpha-beta searches to calculate the priors, and the evaluation for each leaf is just the max of its priors. The hope is that (most of) the tactics can be resolved in the small alpha-beta searches (with all the hand-coded heuristics for extensions and quiescence that we have in our alpha-beta at the moment), while MCTS will be in charge of the top of tree. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Gian-Carlo Pascutto | 12/7/17 2:02 AM | On 07-12-17 09:17, Stephane Nicolet wrote:You can replace the score with a mix of the alpha-beta search results and the value calculated by the neural network. This gives very good results in Go (2 independent evaluations forming an ensemble). Careful reading of the first AGZ paper suggests this approach may even be better than Zero (with the same size network, AlphaGo Master beats Zero). -- GCP |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Ipman Chess | 12/7/17 4:52 AM | Hi Stephane, I have made 32 & 64bit compiles,6 total..strip them..have put them online ,but people get errors..so it doesn't work.. Some reactions here: http://immortal223.pozitiffchess.net/forum/showthread.php?t=32274&page=72 During compiling i see a few warnings: Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\jpqy>cd stockfish master\src C:\Users\jpqy\Stockfish Master\src>make clean C:\Users\jpqy\Stockfish Master\src>make -f MakeFile build ARCH=general-32 CO MP=mingw Config: debug: 'yes' sanitize: 'no' optimize: 'yes' arch: 'any' bits: '32' kernel: 'MINGW32_NT-6.1' os: 'Windows_NT' prefetch: 'no' popcnt: 'no' sse: 'no' pext: 'no' Flags: CXX: g++ CXXFLAGS: -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH LDFLAGS: -static Testing config sanity. If this fails, try 'make help' ... make ARCH=general-32 COMP=mingw all make[1]: Entering directory `/c/Users/jpqy/Stockfish Master/src' g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o benchmark.o benchmark.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o bitbase.o bitbase.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o bitboard.o bitboard.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o endgame.o endgame.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o evaluate.o evaluate.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o main.o main.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o material.o material.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o misc.o misc.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o montecarlo.o montecarlo.cpp In file included from montecarlo.cpp:30:0: montecarlo.h:131:98: warning: 'CompareMeanAction' defined but not used [-Wunused -variable] struct { bool operator()(Edge a, Edge b) const { return a.meanActionValue > b.m eanActionValue;}} CompareMeanAction; ^~~~~~~~~~~~~~~~~ g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o movegen.o movegen.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o movepick.o movepick.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o pawns.o pawns.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o position.o position.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o psqt.o psqt.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o search.o search.cpp In file included from search.cpp:30:0: montecarlo.h:131:98: warning: 'CompareMeanAction' defined but not used [-Wunused -variable] struct { bool operator()(Edge a, Edge b) const { return a.meanActionValue > b.m eanActionValue;}} CompareMeanAction; ^~~~~~~~~~~~~~~~~ montecarlo.h:130:81: warning: 'CompareVisits' defined but not used [-Wunused-var iable] struct { bool operator()(Edge a, Edge b) const { return a.visits > b.visits; }} CompareVisits; ^~~~~~~~~~~~~ montecarlo.h:129:79: warning: 'ComparePrior' defined but not used [-Wunused-vari able] struct { bool operator()(Edge a, Edge b) const { return a.prior > b.prior; }} C omparePrior; ^ ~~~~~~~~~~~ g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o thread.o thread.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o timeman.o timeman.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o tt.o tt.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o uci.o uci.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o ucioption.o ucioption.cpp g++ -Wall -Wcast-qual -std=c++11 -Wextra -Wshadow -g -O3 -DNO_PREFETCH -c -o syzygy/tbprobe.o syzygy/tbprobe.cpp g++ -o stockfish benchmark.o bitbase.o bitboard.o endgame.o evaluate.o main.o ma terial.o misc.o montecarlo.o movegen.o movepick.o pawns.o position.o psqt.o sear ch.o thread.o timeman.o tt.o uci.o ucioption.o syzygy/tbprobe.o -static make[1]: Leaving directory `/c/Users/jpqy/Stockfish Master/src' C:\Users\jpqy\Stockfish Master\src>strip stockfish.exe C:\Users\jpqy\Stockfish Master\src> Hope you can fix this.. Ipman. Op woensdag 6 december 2017 22:21:12 UTC+1 schreef Stephane Nicolet: |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Stephane Nicolet | 12/7/17 12:05 PM | I should have made it clearer that it is not finished yet.. The only thing it does correctly at the moment is not crashing during bench :-) But it has no time gestion, will lose 100 percent of its games on clock, has not played a complete game yet, has not been tuned, etc... I would have preferred to test it here in the framework at least once before you put any binaries on a russian server :-)
|
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Mindbreaker | 12/8/17 11:28 PM | Great to see you working on this. Very exciting. Particularly for endgame. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | stmi...@gmail.com | 12/9/17 5:05 PM | On Wednesday, December 6, 2017 at 10:21:12 PM UTC+1, Stephane Nicolet wrote:Nice effort Stephane. This was exactly the same idea I had after first reading AlphaGoZero Nature paper. Few comments and suggestions. First MCTS in A0 is not pure UCT. It still contains the randomisation factor, a Dirichlet noise Dir(α) that is added to the prior probabilities in the root node. One suggestion would be to use both value and probability of moves in each node, instead of just probability of evaluated move you use now. When you reach a leaf node instead of selecting moves in order provided by generate_moves() you perform eval, cut all moves that are below certain threshold (LMR based on eval) and then perform mpv search with all candidate moves. Then transform the best move score into v and ratio of move scores into P (btw logistic function you use atm doesn't seem to be very suitable, basically you transform 600cp to 75% win probability which is way to small, 100cp should be something around 75% and 600cp close to 95%). For the depth I suggest 8, since this is the value that Alpha0 NN stores in its weights. MPV depth 8 search with 2-3 moves can be done in around 10ns on modern hardware, meaning you could do 100 MCT searches per second. With 800 cores we would at performance of Alpha0. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | SapphireBrand | 12/9/17 6:28 PM |
I have a different impression of the Dirichlet noise. My understanding is that noise is a way to force exploration in training games. This avoids the possibility that the initial state of the move probability function could cause some moves to *never* be searched. But if AZ is choosing a move in a game then the algorithm is straight UCT. It was also my impression that DeepMind dropped that policy in the Alpha Zero paper. In the Alpha Go Zero paper, the program randomly sampled moves based on visit counts for only the first 30 moves, and then relied on Dirichlet noise for the rest of the game. But A0 has the simpler policy of choosing moves in training games solely by sampling from the visit counts. So I think this consideration is moot because SF isn't learning an evaluation function? |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | stmi...@gmail.com | 12/9/17 10:22 PM | Yea, that seems to be correct, I reread it in AGZ paper and Dirichlet noise seems to be only used in self-play. Furthermore, evaluator temperature was always 0 in evaluation meaning only the move with highest node count was considered.
In self-play only in first 30 moves the temperature was set to 1, meaning only in first 30 moves there were other root moves with probability > 0 that could have been explored due to added Dirichlet noise. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Lester Antonio Tattersall Rodriguez | 12/10/17 3:20 AM | I think this is a very important & nice (and probably hard) project, branch. I hope you keep the same interest and good luck. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | stmi...@gmail.com | 12/10/17 2:54 PM | I had another look on this and this doesn't actually hold.
If we really had t->0 during the whole game and no Dirichlet noise then at least in first few moves of each game A0 would always play the same moves which is clearly not the case judging by the pgn of those 10 games against SF. So A0 would always start with the same move (not switch from d4 to Nf3 in few cases) and have always the same reply to SFs moves. Therefore, t was not 0 and there was some Dirichlet noise at least in first few moves of the games with SF. Since there is no mentioning of the temperature used in the last A0 paper, we will not know before they publish a proper paper, but it is quite logical assumption that in first 5-10 moves there was some randomness if nothing but to avoid having only one single opening all the time. |
| RE: A branch to test the Monte Carlo algorithm in Stockfish | SapphireBrand | 12/10/17 5:48 PM | So there is a mystery as to how the A0 vs SF comparison visited distinct games.
In the paper on p15 at the top it says "During evaluation, AlphaZero selects moves greedily with respect to the root visit count." (This is in contrast to the previous paragraphs which said "during training.") Greedy selection precludes randomness and outside influences like libraries or preset variations. A0 varied between 1 d4 and 1 Nf3 when it had White in the 10 published games, so it looks like variability. However, those transpose after 1 move so it isn't really different. Could it be possible that A0 has almost no variability? What causes SF to vary its behavior from game to game? Single-threaded behavior is very reproducible, but would 64 threads be enough to cause sufficient variability? Is it possible that the paper is depending solely on SF to choose different moves from time to time? Or was there a sampling of openings, but the authors neglected to mention? The 10 games I played through were pretty similar to one another, so I am not sure. We know that there is a larger sample of 1200 games played from 12 standard openings. But I can't be sure why/how the 100-game measurement was selected. All the paper has to say: "Settings were chosen to correspond with computer chess tournament conditions: each player was allowed 1 minute per move, resignation was enabled for all players (-900 centipawns for 10 consecutive moves for Stockfish and Elmo, 5% win rate for AlphaZero). Pondering was disabled for all players." I did find confirmation that Dirichlet noise is used by A0 training: "Dirichlet noise Dir(α) was added to the prior probabilities in the root node; this was scaled in inverse proportion to the approximate number of legal moves in a typical position ,to a value of α = {0.3,0.15,0.03} for chess, shogi and Go respectively". So I stand corrected. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Hanamuke | 12/11/17 6:56 AM | It is almost guaranteed that AlphaZero is also multithreaded. It uses 4TPUs, i don't think that all of those cores can work on the same simulated game. That would make it non-deterministic the same way that stockfish is. It would make sense that AlphaZero would play often the same thing though, the thread interference is much lesser in the MCST than in an alpha beta that uses complex move-ordering and pruning heuristics. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Ronald de Man | 12/11/17 3:41 PM | On Monday, December 11, 2017 at 2:48:57 AM UTC+1, SapphireBrand wrote: A few search threads already give quite a bit of variability. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Fauzi | 12/12/17 6:19 AM | @snicolet when do you think this will become operational ? |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | j...@baxters.biz | 12/18/17 1:23 PM | I just came upon this thread. I have implemented this idea independently. Specifically, I have written a java implementation of the MCTS algorithm described in the A0 paper. You can plug in any game engine provided it generates a list of actions and their scores for each game state. Java performance is no issue provided the computation overhead is dominated by the game engine, which is certainly the case for SF (and deep nets).
So far I have written a stockfish plugin that communicates via UCI commands. It is up and running and debugged. On my macbook air I am setting SF movetime to 25ms, multipv to 20, and transforming raw centipawn scores into win probabilities by p(win) = logistic(a * cp) where the constant a is set such that 200 centipawns corresponds to a win probability of 0.8. I got this calibration from the plots here: https://chesscomputer.tumblr.com/ The win probability is transformed into an expected value (score) for the action (move) by v = 2 * p(win) -1, which sets the expected value to 1 if p(win) = 1 and to -1 (a loss) if p(win) = 0. Note that this does not explicitly model the probability of a draw (but then neither does A0 AFAICT). The value for a game state (chess position) is just the score of the best move. The prior probability of each action (P[i] in the A0 paper) is computed as a softmax of the move scores: P[i] ~ exp(score(move i) / T) where T is a temperature parameter controlling how concentrated the prior probabilities are - higher temperature yields a more uniform distribution whereas T -> 0 greedily chooses the highest-scoring move. I think varying temperature is more natural way to inject more randomization at the root node that adding Dirichlet noise. For example, T could be set to a higher value at the root and then reduced for interior nodes, although so far I have just set T=1.0. Since I only just got this working I don't have anything remarkable to report: it's quite slow obviously since single-threaded is only search 40 positions per second at 25ms per position, so to get to the ~500,000 positions A0 evaluates in one minute is going to take some bigger hardware and a lot more time. However, I have run a couple of tests on some of the positions in the published A0 games and it looks like the MCTS is able to put more weight on the moves SF finds it hard to see like Bg5!! I have also tried one of the Bratko-Kopec positions that SF is unable to solve and again the MCTS is able to increase the rank of the correct move above where SF has it even after letting it search for an hour. I'll probably upload the code to github shortly - maybe someone would be interested in doing a port to the SF codebase in C++? :) |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | j...@baxters.biz | 12/18/17 4:14 PM | https://github.com/sirjbaxter/CuttleFish
All hard-coded for now: you can change the path to the stockfish executable in UCIEngine.java, as well as the move time. The number of moves to consider is hard-coded in ChessBoard.java. The main of MCTS.java is an example of how to run. |
| Re: A branch to test the Monte Carlo algorithm in Stockfish | Lester Antonio Tattersall Rodriguez | 1/3/18 9:53 PM | I already said here or in other place. But just in case, I would like to say that I like very much some kind of montecarlo implementation. I think you had a very usefull idea. I hope you can implement such a thing. Even if it was only a starting-humble Montecarlo, imho. And also, possible it could carry more patches gaining elo/strengh for an engine. |