Leela Rising (my post from Chess.com)

1,048 views
Skip to first unread message

Simon Fox

unread,
Aug 3, 2018, 3:53:57 AM8/3/18
to LCZero
https://www.chess.com/forum/view/general/leela-rising
Just wanted to share my article I wrote for chess.com, hope you enjoy it:


----------

Hello everyone, this is my first post. As someone who has built a chess engine from scratch and slavishly chased down every possible optimization and tweak, I have some thoughts about the new self-taught "zero" chess engines, namely Leela and Alpha Zero. 

When the Alpha-Zero research paper came out, I was in the middle of crafting what I hoped would be a beautiful evaluation function for my engine. I watched the analysis videos of Alpha Zero playing this human, but deadly accurate form of chess and it was clear that the very idea of an evaluation function was done. But we'll challenge that notion here, for I want to make the point that the Alpha-Beta Stockfish, Komodo, Houdini era may not go so quietly.

The Deep Mind paper was an amazing piece of research and some criticism can be levelled at them for abandoning us chess tragics. However, keep in mind that good quality research papers allow the work to be reproduced and this is not as common as you'd hope in computer science. The open-source project Leela reminds us of the service Deep Mind did for us to enjoy. As we sleep each night, Leela dutifully grinds through thousands more games against herself, steadily climbing towards the top of the engine tree.

Alpha Zero was trained within 24 hours with the super computing power afforded by Google. But Leela provides us the drama of a child prodigy climbing to the top, with many hidings along the way from superior opponents. Please take a look at wonderful YouTube channel of KingCrusher (https://www.youtube.com/channel/UCDUDDmslypVXYoUsZafHSUQ), for instructive breakdowns of Leela beating up older versions of strong chess engines. There are some notable occasional wins against Stockfish 8, however for the moment she is still a few hundred ELO points behind. Projections suggest sometime around November 2018 Leela will overtake Stockfish 9, though if one reads the blog posts and research details, there are ongoing hurdles to get the neural network to train itself effectively each night. Problems such as over-fitting the training data are ever-present. This is analogous to a player simply remembering training games and not learning the general principles from them. 

There is one key point to understand about this engine development. Some of the time, we still have no way to know the status of a given move. Is it the best move, a winning move, drawing move, losing move? Unless we can prove it by force, we often don't know. When a Stockfish engine rated 3400 ELO gets spanked by Alpha Zero, where do we turn for an analysis? What do we even really know about the game now? Who's to say that Alpha Zero was even correct?

Consider Leela's winning games against lesser opponents such as Stockfish 7 and Komodo 9, prior versions of the best engines. Watching KingCrusher's YouTube channel one is led to believe that aggressive king attacks combined with stifling anything else on the board is a winning approach. But what happens when a 3400 neural network engine meets a 3500 neural network engine? Will the board erupt in fireworks, or will it be locked up and fizzle? The point is, that we have yet to see how a strong neural network engine defends brilliantly. All the brilliancies are about the wins. Will we see the same deep strategy and disregard for material applied to defence? I suspect, with two equal engines, the opening choice may decide.

The other point I want to make is that the Alpha-Beta paradigm may not be so easily written off. The top engines continue to add ELO points every year. Hardware will continue to improve, but more importantly, the authors of the best engines will soon have a better engine to learn from. Consider what it's like being a StockFish developer right now. Where can you look to improve the engine? It's not like it loses very often, certainly not often enough to form general learnings. If/When Leela overtakes and starts dealing out beatings like we saw from AlphaZero, there is a new test suite of games to look at and improve from. Endless games, and we may see great things.

Compare this to how Leela can improve. Certainly the Monte Carlo search algorithm is fertile ground for research. But the neural network itself is a morass of interconnected numbers and weird functions. You can increase the size of the network and throw more learning games at it, but perhaps at some point the ELO increase flattens out and then the paradigm is exhausted. (I'm not sure of the current state of the research on this point).

Maybe for the next 10 years we will see fun, exciting play from Leela and other newcomers, crushing any old-school engine that comes near it. We'll re-write the opening books and have a brilliant time. And then, the neural network era will fizzle out and Komodo 24 will continue its relentless plod forward and we'll learn again that 3700 strength chess is about iron-clad, rock-solid attrition. Don't count this scenario out, it may yet come to pass. And maybe in another 10 years a new paradigm will come along, quantum computing perhaps?

Finally, a note about Leela and evaluation functions. Buried deep inside the neural network are encoded learnings from millions of self-play games. When trained successfully, the network does not remember the positions but rather learns generalizations from them. This is true of neural networks in general. As humans, Leela and AlphaZero re-ignite our interest in the game because they play in a way more familiar and exciting to us. The moment you create one single numerical constant in a traditional Alpha-Beta search evaluation function, you are as a human imposing imperfect knowledge on the engine. It will be expressed over and over, and deeply impact the result. Leela does not suffer this, and it's not possible in any meaningful way to ask her "what is the value of a knight?". Or, "this passed pawn is on the 6th rank, how do I adjust its value in light of its queening chances?". The board position goes into a black box, and a bunch of sigmoid functions are crunched and out pops a load of probabilities. It's neither like a human or an Alpha-Beta engine. Completely inscrutable and unknowable. We are going to have so much fun with this in the years to come. On a personal note, my own blitz rating has jumped 150 points recently after watching KingCrusher's videos of Leela's wins. It's awesome stuff.

But, I'm sticking my neck out here and saying in the year 2028, an AlphaBeta chess engine may well be the more formidable opponent, with deadly tactical accuracy. After all, at the end of the day, strategy is merely an approximation to tactics. If we can see to the end of the game, there is only tactics. StockFish will never see to the end of the game, but it may yet see further than a neural network can. 
Let's see! Thanks for reading.


Lito

unread,
Aug 3, 2018, 8:06:36 AM8/3/18
to LCZero
In 2028, AB engines will have become obsolete.

Matt Blakely

unread,
Aug 3, 2018, 9:32:52 AM8/3/18
to LCZero
Well before 2028 at this rate

I don't see going back to AB engines ever, from here it will be better nets, etc.  I imagine specialized endgame nets replacing table bases even as they will handle a greater # of peices.

Perhaps the next level would be a quantum computer that can literally solve chess, thats about the only thing I think that might replace machine learning from here.

Deep Blender

unread,
Aug 3, 2018, 10:06:43 AM8/3/18
to LCZero
What makes you believe that Alpha-Beta pruning is superior to Monte-Carlo tree search for chess?

Graham Jones

unread,
Aug 3, 2018, 10:46:19 AM8/3/18
to LCZero
If people are going to make predictions, I would like to see a lot more clarity about exactly what they're making predictions about. I think Deep Blender's question may be getting at the same point. I suspect that none of Simon Fox, Lito, or Matt Blakely are actually really concerned with the tree search algorithm at all.

Hand-crafted vs machine learning for evaluation? That's quite tricky. There's so many things in between. Stockfish's evaluation function is already 'designed' by tweaking it then testing it using lots of self-play games (sound familiar?). Although the SF evaluation function is comprehensible up to a point, no one can tell you why it is the way it is. If you ask a question like "why do bishops do x-ray attacks, but not rooks?" you will basically only get one of two answers: "It hasn't been tried yet, so we don't know if it improves ELO" or "We tried it and found it didn't improve ELO". Arguably, SF has been using machine learning for years without anyone calling it that. And then, at the other extreme, you could have a machine learning algorithm with carefully hand-crafted features, perhaps features based on SF's evaluation. I wouldn't be depressed if I was "in the middle of crafting what I hoped would be a beautiful evaluation function".

The authors of 'traditional' engines do need to get their heads around GPUs...

David Larson

unread,
Aug 3, 2018, 11:08:06 AM8/3/18
to LCZero
Nice article. Good job.

Chris Whittington

unread,
Aug 3, 2018, 11:13:04 AM8/3/18
to LCZero
Good article! Nice to see a mix of chess, programming and philosophy come together.

I share your semi-prediction that AB might still win this particular race, well, my view (as a programming, chess-playing, philosopher haha) is that it is by no means as clear as many people think that the AZ approach will dominate.

What counts here is that we're playing chess, chess runs in an 8x8 world, the rules tend to take games to simplification, if we don't do something clever to get an advantage somehow, relatively early on, or create some special form of complexity, then games tend to peter out into draws. It's not Shogi, or any of the other group of games that get more complex with time, for example. So, however we want to discuss and compare AlphaBeta with NN-MCTS we need to keep in mind we're talking about a game with a tendency to run out steam as time advances. Our search paradigm, whichever which way, has a brick wall of simplifiication and tendency to draw coming at it from the opposite direction. We're not at the brick wall yet in 2018, but it approaches.

So, firstly, I see the AB, NN-MCTS as one in which AB, with the head start, just might make it to the finish line without being overtaken, so to speak. That's a tough one to call, but we can try ....

AB remains a highly materialistic paradigm, it still has "material" + "positional", and yes, it's improved dramatically over the years, but still many, many decisions in AB search are based on material. So *If* the AB programs are collectively only searching a relatively materialistic section of the possible sensible games, AND there are a sufficient set of games with wilder pathways/tree space regions, AND NN-MCTS can find these pathways/regions and steer towards them, *then* NN-MCTS begins to look like it might be in with a chance, just on evaluation concepts alone.

*If* however, AB programs can handle the wilder non-materialistic pathways by picking a way through to quiet-land again, using massive depth scrunching, then they are okay. 

We don't know, is the answer to this one. We simply don't know how much AB-unexplored but sensibly playable tree there is. LC0 getting very good, and scrunching Stockfish in pyrotechnics or positional binds or some kind of huge positional advantage despite lack of material will tell us, if it happens. It seemed to happen, sometimes, in those AZ-SF8 games, so maybe.

But, big but, as you also point out, Stockfish is not standing still; It's human developers will have games on which to work out how to improve. The current rate of improvement of LC0 (well, discount all the relearning of the last couple of months, and the apparent stall) is faster than the rate of improvement of Stockfish. But LC0 starts from a much lower level, so, take your pick. Rate of improvement is not linear, it may be horizontally asymptotic, there are signs of that with LC0, although the developers have ways round.

I don't want to predict the Elo graphs of these two into the future. It's just not clear. But, I hazard a guess. Moving target Stockfish is going to be a very tough nut to crack, LC0 may not be able to do it.

gravity_well

unread,
Aug 3, 2018, 12:03:32 PM8/3/18
to LCZero
very nice.  
my two cents...
just to expand on what you said about your personal ELO.  The correlation between our own personal chess elo and the strength of engines can't be ignored.  And I would hope that at the end of the day the theory-crafting in NN technology doesn't become a "my engine is better than yours" battle.  That said... my personal interest in this project (and I think others as well) is the inspiring feeling you get when you hear "this NN knew nothing of chess before training it."  simply amazing, IMO.

FWCC1

unread,
Aug 3, 2018, 1:20:18 PM8/3/18
to LCZero
Nice writing,but I don't think NN have to see to the end like an Alien Stockfish might. NN play from experience and almost intuition if you will.This "intuition" is unlike anything we have seen. I think experience and intuition will win though the intuition is artificially produced.

Simon Fox

unread,
Aug 3, 2018, 1:24:43 PM8/3/18
to LCZero
The predictions are all about ELO strength. Though they're not really predictions, just ideas to think about.

Regarding MCTS, I made this comment: "Compare this to how Leela can improve. Certainly the Monte Carlo search algorithm is fertile ground for research", which references the point that the MCTS algorithm has not gone through the detailed refinement that AB has. But on the other hand, maybe the policy component of the neural network does as much improvement as can be done. I'm not sure, merely speculating.
Message has been deleted
Message has been deleted

Deep Blender

unread,
Aug 3, 2018, 4:38:45 PM8/3/18
to LCZero
Besides the potential in the Monte-Carlo tree search, there is also a lot to explore in the actual neural network that is being used. The current architecture is very simple and there is very likely a lot of potential for improvements regarding the speed and accuracy. There is also a lot of research going on to improve reinforcement learning. At the moment, this is a very computationally expensive and wasteful process. But in the past years there has been a lot of progress and nothing indicates that this is going to stop in the near future. Those improvements will make it possible to to run more experiments which will help to find improvements a lot faster.
In my opinion, you are underestimating the potential.

Simon Fox

unread,
Aug 3, 2018, 6:52:02 PM8/3/18
to LCZero
I actually agree that the research on reinforcement learning is open ended, though I did mention in my article that I'm not up to date on the latest developments. I also hope that this zero approach to chess keeps going strong, as it's exciting for both chess and machine learning. But I think there is a counter-narrative to explore, as long as we see Alpha-Beta engines continue to improve.

There's something that just worries me slightly about neural networks. We don't really know what coverage they have of the game. In the Alpha-Go documentary, the Deep Mind guys mentioned these pockets of delusion that can occur when the neural network strays into territory that it has not sufficiently studied. More likely of course in Go with the vast game tree.

We know these exist currently, because we have situations where Leela will destroy Stockfish 7 from a given position, and get beaten by Stockfish 8. So, in principle at least we can witness first hand that certain positions are returning inaccurate results from the neural network.

There may be a time where Stockfish just falls behind and can't ever catch up. I'm just not so sure about that yet.
Message has been deleted

Simon Fox

unread,
Aug 3, 2018, 6:53:57 PM8/3/18
to LCZero
Question for folks to consider, if you had to bet your life savings on Leela versus Stockfish, where both engines are rated 3500 on the given hardware specs of that match, who would you choose?

Deep Blender

unread,
Aug 4, 2018, 5:03:20 AM8/4/18
to LCZero
The lack of coverage is certainly one of the current main issues when it comes to Monte-Carlo tree search in combination with reinforcement learning. This is typical and was exposed in the match against Lee Sedol where AlphaGo misjudged a position. The solution was to train it longer. That's a huge contrast compared to conventional engines. In other engines, the solutions have to be crafted while with reinforcement learning, the solutions are found and resolved by the engine on its own. That is a huge advantage.
Reply all
Reply to author
Forward
0 new messages