MCTS wrapper for StockFish

701 views
Skip to first unread message

j...@baxters.biz

unread,
Dec 18, 2017, 7:27:03 PM12/18/17
to FishCooking
Stephane Nicolet posted the idea of using a quick alpha-beta search as the leaf node evaluator in an alpha-zero-style MCTS:

https://groups.google.com/forum/#!topic/fishcooking/AE4EgWQ20dY

I have also implemented this idea independently, so I thought I might share it here to see if others are interested. Specifically, I have written a java implementation of the MCTS algorithm described in the A0 paper. You can plug in any game engine provided it generates a list of actions and their scores for each game state. Java performance is no issue since the computation overhead is dominated by the game engine. Note that this is for playing only - there is no machine learning involved (yet).

https://github.com/sirjbaxter/CuttleFish

So far I have written a stockfish plugin that communicates via UCI commands.

All hard-coded for now: you can change the path to the stockfish executable in UCIEngine.java, as well as the move time. The number of moves to consider is hard-coded in ChessBoard.java. The main of MCTS.java is an example of how to run.

On my macbook air I am setting SF movetime to 25ms, multipv to 20, and transforming raw centipawn scores into win probabilities by p(win) = logistic(a * cp) where the constant a is set such that 200 centipawns corresponds to a win probability of 0.8. I got this calibration from the plots here: https://chesscomputer.tumblr.com/

The win probability is transformed into an expected value (score) for the action (move) by v = 2 * p(win) -1, which sets the expected value to 1 if p(win) = 1 and to -1 (a loss) if p(win) = 0. Note that this does not explicitly model the probability of a draw (but then neither does A0 AFAICT).

The value (i.e the backed-up value in MCTS) for a game state (chess position) is just the score of the best move. The prior probability of each action (P[i] in the A0 paper) is computed as a softmax of the move scores: P[i] ~ exp(score(move i) / T) where T is a temperature parameter controlling how concentrated the prior probabilities are - higher temperature yields a more uniform distribution whereas T -> 0 greedily chooses the highest-scoring move. I think varying temperature is more natural way to inject more randomization at the root node that adding Dirichlet noise. For example, T could be set to a higher value at the root and then reduced for interior nodes, although so far I have just set T=1.0.

Since I only just got this working I don't have anything remarkable to report: it's quite slow obviously since single-threaded is only search 40 positions per second at 25ms per position, so to get to the ~500,000 positions A0 evaluates in one minute is going to take some bigger hardware and a lot more time. However, I have run a couple of tests on some of the positions in the published A0 games and it looks like the MCTS is able to put more weight on the moves SF finds it hard to see like Bg5!! I have also tried one of the Bratko-Kopec positions that SF is unable to solve and again the MCTS is able to increase the rank of the correct move above where SF has it even after letting it search for an hour.

All comments criticisms etc welcome. Enjoy!

j...@baxters.biz

unread,
Dec 18, 2017, 8:30:50 PM12/18/17
to FishCooking

Make that ~5,000,000 positions A0 evaluates per minute.

jjoshua

unread,
Dec 20, 2017, 4:31:53 PM12/20/17
to FishCooking
Interesting. I checked out the blog you got the win rate function and it won't be accurate. It only used stockfish older version up to depth 16 and this was used to predict win rates of humans vs human patzers. Not super human chess entities searching deep. You would get better data from running fishtest and saving the ltc games. Or maybe using tcec and other data.
So each stockfish instance should be set to 1 thread but your app should be able to have up to 32 threads for a hypertheading 1950x to analyze 32 positions at once. You also shouldn't set multipv in stockfish but do that work in your app. Multi pv is much weaker search.
If you make some more progress I would do some testing on a 1950x.

j...@baxters.biz

unread,
Dec 20, 2017, 4:44:07 PM12/20/17
to FishCooking

Thanks for the feedback. Yes - I plan to add multithreading when I get a moment. It's straightforward with MCTS using virtual loss per the alpha-zero paper. I have a Ryzen 1700 to test on, and some 8/16 core/thread Xeon D's., but your 1950X should smoke those.

I agree the win prob is off, but I don't think the MCTS search is all that sensitive to miscalibration, provided the win prob is monotonic in the raw score. The effect is the same as adjusting the temperature in the move distribution.

Can you elaborate on the multipv? The MCTS needs prior probabilities for all the moves, and I figured the easiest way to get those was using multipv (in reality I only pick the top 20 since it is very rare for a good move to be lower-ranked than that).

Mark Jordan

unread,
Dec 20, 2017, 5:11:20 PM12/20/17
to j...@baxters.biz, FishCooking
There was a recent suggestions in the ideas thread about Sometimes doing multipv of 2 in recapture to save time and do the recapture sooner if the second best move appears to be a lot worse. But multipv is so much slower that none of those tests passed.
They ran a test that showed even running just multipv of 2 5% of the time results in an elo loss of about 7. Even though sometimes the extra focus on the 2nd best move would discover it was better.

If you are doing all searching at multipv 20 you are starting at such an elo handicap I don't think you can recover.

This might be why you need a neural network to make good enough pruning decisions to make up for speed loss. But don't let my pessimism discourage you I would have told the Google researchers it wouldn't work too.

I think the best would be to use a neural network with sf playouts of depth 8 plus qs or so.

j...@baxters.biz

unread,
Dec 20, 2017, 5:26:22 PM12/20/17
to FishCooking

Yeah, I don't doubt that multipv for a-b search is a loser. The goal of this project is to try and understand how much of A0's performance gain is due to the neural network and how much is due to MCTS - both are radical departures from traditional chess engines using linear evaluation functions and a-b search.

Note that it's not really a fair fight between A0 and SF: per unit time A0 is using vastly more floating point operations that Stockfish does (I know SF doesn't use floating point but the point is the same if you compare to Stockfish's use of integer ops or any other metric that makes sense). Part of the problem is that Stockfish simply can't make use of that much processing power.

So I am particularly interested in what happens if you search as many (leaf) positions using MCTS as does A0, but with a traditional a-b searcher evaluating move probabilities and scoring the leaf nodes, instead of a deep network. This should scale better with thread count and my hunch (I could be very wrong) is that it will do better than just the a-b searcher. But more generally, it is interesting to see if the MCTS helps a "dumb" a-b searcher find moves that it can't otherwise see, even if it is slower.

Reply all
Reply to author
Forward
0 new messages