I have also implemented this idea independently, so I thought I might share it here to see if others are interested. Specifically, I have written a java implementation of the MCTS algorithm described in the A0 paper. You can plug in any game engine provided it generates a list of actions and their scores for each game state. Java performance is no issue since the computation overhead is dominated by the game engine. Note that this is for playing only - there is no machine learning involved (yet).
https://github.com/sirjbaxter/CuttleFish
So far I have written a stockfish plugin that communicates via UCI commands.
All hard-coded for now: you can change the path to the stockfish executable in UCIEngine.java, as well as the move time. The number of moves to consider is hard-coded in ChessBoard.java. The main of MCTS.java is an example of how to run.
On my macbook air I am setting SF movetime to 25ms, multipv to 20, and transforming raw centipawn scores into win probabilities by p(win) = logistic(a * cp) where the constant a is set such that 200 centipawns corresponds to a win probability of 0.8. I got this calibration from the plots here: https://chesscomputer.tumblr.com/
The win probability is transformed into an expected value (score) for the action (move) by v = 2 * p(win) -1, which sets the expected value to 1 if p(win) = 1 and to -1 (a loss) if p(win) = 0. Note that this does not explicitly model the probability of a draw (but then neither does A0 AFAICT).
The value (i.e the backed-up value in MCTS) for a game state (chess position) is just the score of the best move. The prior probability of each action (P[i] in the A0 paper) is computed as a softmax of the move scores: P[i] ~ exp(score(move i) / T) where T is a temperature parameter controlling how concentrated the prior probabilities are - higher temperature yields a more uniform distribution whereas T -> 0 greedily chooses the highest-scoring move. I think varying temperature is more natural way to inject more randomization at the root node that adding Dirichlet noise. For example, T could be set to a higher value at the root and then reduced for interior nodes, although so far I have just set T=1.0.
Since I only just got this working I don't have anything remarkable to report: it's quite slow obviously since single-threaded is only search 40 positions per second at 25ms per position, so to get to the ~500,000 positions A0 evaluates in one minute is going to take some bigger hardware and a lot more time. However, I have run a couple of tests on some of the positions in the published A0 games and it looks like the MCTS is able to put more weight on the moves SF finds it hard to see like Bg5!! I have also tried one of the Bratko-Kopec positions that SF is unable to solve and again the MCTS is able to increase the rank of the correct move above where SF has it even after letting it search for an hour.
All comments criticisms etc welcome. Enjoy!
Make that ~5,000,000 positions A0 evaluates per minute.
Thanks for the feedback. Yes - I plan to add multithreading when I get a moment. It's straightforward with MCTS using virtual loss per the alpha-zero paper. I have a Ryzen 1700 to test on, and some 8/16 core/thread Xeon D's., but your 1950X should smoke those.
I agree the win prob is off, but I don't think the MCTS search is all that sensitive to miscalibration, provided the win prob is monotonic in the raw score. The effect is the same as adjusting the temperature in the move distribution.
Can you elaborate on the multipv? The MCTS needs prior probabilities for all the moves, and I figured the easiest way to get those was using multipv (in reality I only pick the top 20 since it is very rare for a good move to be lower-ranked than that).
Yeah, I don't doubt that multipv for a-b search is a loser. The goal of this project is to try and understand how much of A0's performance gain is due to the neural network and how much is due to MCTS - both are radical departures from traditional chess engines using linear evaluation functions and a-b search.
Note that it's not really a fair fight between A0 and SF: per unit time A0 is using vastly more floating point operations that Stockfish does (I know SF doesn't use floating point but the point is the same if you compare to Stockfish's use of integer ops or any other metric that makes sense). Part of the problem is that Stockfish simply can't make use of that much processing power.
So I am particularly interested in what happens if you search as many (leaf) positions using MCTS as does A0, but with a traditional a-b searcher evaluating move probabilities and scoring the leaf nodes, instead of a deep network. This should scale better with thread count and my hunch (I could be very wrong) is that it will do better than just the a-b searcher. But more generally, it is interesting to see if the MCTS helps a "dumb" a-b searcher find moves that it can't otherwise see, even if it is slower.