Achieving a better chess engine by a neural net and tablebase hybrid.

1,296 views
Skip to first unread message

mat1...@student.lu.se

unread,
Sep 7, 2018, 9:17:22 AM9/7/18
to LCZero
Why not replace the neural net completely with the tablebases when such a position is reached, as chess is solved for 7-pieces OTB.
It would make sense to do this if the aim is to create the best chess engine possible.

I suggest that we train a network to only aim for tablebase positions. It does not need to understand the complexity of the endgame.

All self-play training games can be ended when a tablebase position is reached, and the result replaced by the solved result for that position.
This would likely speed up the training process.

This could also allow the network to learn more complexity in the opening and middle-game, by not needing to fully understand the endgame.
What it learns by playing through endgames seems unlikely to improve opening and middlegame performance.

As a neural network is always an approximate function, and the amount of "different" information that it needs to store is going to limit its performance.
The endgame is in many respects different from the opening and middlegame. Certain tablebase positions are only won by hundreds of perfect moves,
it seems unreasonable for a network to ever learn this kind of play perfectly.

By removing the endgame we are constructing a simpler game for it to learn, and thus likely a better overall performance could be achieved.

A Thule

unread,
Sep 7, 2018, 9:31:15 AM9/7/18
to LCZero
As has been said repeatedly on this list 'aim was NOT to create the best chess engine possible' but to achieve similar success against AB search engines using self-learning NN and MCTS as Alpha0.  If achieving the best chess engine happen to result, all the better.

That said, TB positions are determined outcomes.  Makes no sense to cause lc0 to train to discover determined outcomes.  

As another thread noted Alpha0 did the following:

"[...] In order to save computation, clearly lost games are resigned. The resignation threshold v_resign is selected automatically to keep the fraction of false positives (games that could have been won if AlphaGo had not resigned) below 5%. To measure false positives, we disable resignation in 10% of self-play games and play until termination."

If the zero philosophy accepts resigning games to save computation, it should also tolerate winning games using TB to save computation (or resigning lost positions).

Tom Idermark

unread,
Sep 7, 2018, 10:36:39 AM9/7/18
to A Thule, LCZero
As has been said repeatedly on this list 'aim was NOT to create the best chess engine possible' but to achieve similar success against AB search engines using self-learning NN and MCTS as Alpha0.  If achieving the best chess engine happen to result, all the better.

Would you be kind enough to provide a reference to this quote and from which developer (that is authorised to speak for the whole team) it came from? Of course creating the best chess engine possible cannot be the aim since that implies playing perfect chess which any computer scientist (and many chess players) knows is out of reach using current technology (while theoretically possible). And since your statement also implies that, once LC0 beats SF8 (under conditions with many uncertainties,) the whole team would just put their pens down and move on to other things … I think that would be very sad.

That said, TB positions are determined outcomes.  Makes no sense to cause lc0 to train to discover determined outcomes.  

It make perfect sense! Just as the OP pointed out, if Leela is trained to play towards positions that gives a score of +1 by looking up an EGTB, the network resources are diverted from extracting end game features (learning end game play) and used instead to perform better in the late middle game to steer her moves towards winning end game positions. 

EGTB support is already provided (limited for now) in LC0 so possibly the OP argued for training with EGTB (not just inference). And I agree but I don’t know the status. If (after a reset) LC0 was trained with EGTB and then play without it, I guess the result would be catastrophic … :-)
 
-- 
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/bbc037ae-6b82-4695-85de-a00f24a35613%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Graham Jones

unread,
Sep 7, 2018, 10:43:28 AM9/7/18
to LCZero

In short, the goal of the project is first replicate AlphaZero, and then make it as strong as possible.

The main reason of "replicate AlphaZero" part is actually not to test reproducibility in scientific sense, that's more like a nice side effect.
The main reason is that there were many attempts to create NN engine before, and they failed, while AlphaZero succeeded.
There are many possible improvements which really look beneficial if you think about them, but the problem with implementing them is that if we stuck, we cannot longer just compare what we are doing differently and fix that. We do stuck a lot, and in fact there are lots of surprising subtleties which matter (recent example is sampling rate, we used ~0.95 I guess, and it was too much).
So, while we can follow the guide, we do.

After we reach AlphaZero state (or if we stuck before that and won't be able to find any explanation), the goal will surely be to "create as strong chess engine as possible".
I expect that there will be some zero vs non-zero debate, which may result in two different forks, but I expect non-zero to get much more attention in the end because it will be stronger.

Tom Idermark

unread,
Sep 7, 2018, 10:49:15 AM9/7/18
to Graham Jones, LCZero
Thanks Graham!

M

unread,
Sep 7, 2018, 11:21:47 AM9/7/18
to LCZero
One could argue that the usage of endgame tablebases for training the neural net actually doesn't violate the zero approach. TB are not "human knowledge", but in some sense "absolute knowledge", as they tell us the definite value of the positions, without any heuristics involved.

The question is more if using TB for training actually makes the neural net weaker. Playing to the very end lets the neural net discover and learn all those elementary mating patterns in their "natural" form. Very well possible that a TB-trained net has problems with non-TB positions which involve such mating patterns (within a subset of the pieces). Just imagine that you teach a child how to play chess, but instead of playing to the end you announce "you lost", "you win" etc. as soon as there are only 6 pieces left.

That said, I really don't know what the outcome of a TB-trained network will be, and I think this definitely should be tried. Just to make sure that for TCEC14 nobody else shows up with a super strong chess engine – let's call it DeusXXL for the moment – which differs from Leela only in the point that TB have been used for training.

A Thule

unread,
Sep 7, 2018, 7:47:25 PM9/7/18
to LCZero
I, for one, don’t believe including TBs violates the “zero” principle since outcome given a TB position is predetermined. That said lc0’s training is influenced by success and failure. Reaching favourable TB position should reward the same as winning, by principle of transitivity.

Lc0 should be able to play for win, or play for favourable TB position, without distinction since both amount to the same thing.

Dave Whipp

unread,
Sep 8, 2018, 12:15:06 AM9/8/18
to A Thule, LCZero
I'm willing to accept that TB doesn't violate the zero principle; but it does violate what we might call the Occam principle: it's definitely something extra, derived from rules of check but it's not a simplest form. So to my aesthetic sense it seems somewhat inelegant.

If it turns out that TB have a huge impact on training then it may be a cost worth paying; but my intuition is that they'd make very little difference. Leela's strength seems to be positional evaluation in parts of the game where even a deep AB search doesn't reach a 6-piece position. It doesn't seem that th euse of TB in training would change a significant part of the training data; so my guess is that it's a lot of work for small benefit. The only way to be sure is to try it; but trying it is costly.

On Fri, Sep 7, 2018 at 4:47 PM A Thule <thul...@gmail.com> wrote:
I, for one, don’t believe including TBs violates the “zero” principle since outcome given a TB position is predetermined.  That said lc0’s training is influenced by success and failure.  Reaching favourable TB position should reward the same as winning, by principle of transitivity.

Lc0 should be able to play for win, or play for favourable TB position, without distinction since both amount to the same thing.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Robert Pope

unread,
Sep 9, 2018, 11:34:09 PM9/9/18
to LCZero
It's difficult/frustrating to try to talk about "Zero" because it means so many different things to different people.  When Deepmind came out with AlphaGo Zero and Alpha Zero, my impression of "Zero" from them was "Hey, we've found this neat framework for neural net learning that works really well.  It's flexible enough that it can be applied to a wide variety of applications with nothing more than the basic rules coded in, and reach/surpass the highest levels of performance."

From that perspective, things like tablebases go well beyond a Zero philosophy of "tell it how to play and let it learn".

Other people take the "Zero" and read that as "create a learning process that isn't limited/biased by the preconceptions of other players, or of its programmers (which preconceptions might hamstring its ultimate abilities). And from that perspective, sure, tablebases aren't going to add any distortions or incorrect biases in its learning.

But the fact that both of these groups rally around the "Zero Principle" and mean different things makes it difficult to debate.  Personally, now that the concept has been validated in general, I'm all for giving Leela-type programs all the hooks we can to help them reach their potential, whether that is new input planes, tablebases, or dual-brain hybrids.

A Thule

unread,
Sep 9, 2018, 11:48:36 PM9/9/18
to LCZero
... and there are many people who reference the zero principle who clearly don’t understand it at all. That it means different things to different people means only that not everyone who brandy’s the principle a boy is correct.

Tom Idermark

unread,
Sep 10, 2018, 2:05:51 PM9/10/18
to A Thule, LCZero
Yes, in order to have an informed and constructive discussion on any subject, one needs to have a common and shared use of nomenclature. Based on the A0 paper terminology, I think there are a couple of terms to agree on with respect to LC0:
Tabula Rasa: Self learning is started with a ”clean slate” (randomised weights and no pre-labelled data for training)
First Principle: Only the minimal needed information (rules, constraints etc.) required are provided
Zero Algorithm: A ”general” algorithm that, constrained by the above two points, learns a game (or some other task)
The term ”Zero Principle” is not defined nor discussed in the A0 paper so it is quite natural that different people have different views on what it means. But I have seen two ”distinctions” emerge from the forum discussions which I share below (basically same as above):
How learning is performed: Unsupervised from scratch (tabula rasa), as in A0 and LC0 or supervised such as in DeusX
What ”knowledge" is provided to the NN: Only basic rules (first principle), +EGTB, +chess domain specific tree search tuning, +additional input planes etc. The extreme would be tainting the NN with human heuristics of piece values, etc. as in ”classic” engines
My personal interpretation of the LC0 ”zero principle” discussions is that the tabula rasa is not questioned (no supervised learning wanted) but the first principle constraint is up for debate as long as it does not impose any human bias / tainting. And I agree with that (if my interpretation is correct of course). As long as Leela is not influenced by human preconceptions, let her learn as much as possible, as easily as possible. And later we can all enjoy her victories (and have KC analyse them for us :-)

Cheers

PS I didn’t get the brandy and the boy reference. But I may be tired ...

10 sep. 2018 kl. 05:48 skrev A Thule <thul...@gmail.com>:

... and there are many people who reference the zero principle who clearly don’t understand it at all.  That it means different things to different people means only that not everyone who brandy’s the principle a boy is correct.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Trevor G

unread,
Sep 10, 2018, 2:30:41 PM9/10/18
to Tom Idermark, A Thule, LCZero
In another thread, somebody had mentioned a kind of application of Occam’s Razor to the zero idea/ideal.

I do think that along those lines, part of the guiding philosophy ought to be that training follows as simple and consistent an algorithm as possible. Because of this, I think that table-bases are bad (in training) because it’s based on a rather artificial and arbitrary boundary. Similarly, we might dream up other hacks like fixed depth tactical search using stockfish during training, and again, I’d be against that type of complexity.

That being said, I would not be opposed to things like a secondary network that is trained on adjudicated game scores based on the primary network’s training data and table-bases. However, I think the primary network (“primary” simply meaning where most of the community’s compute resources are going) ought to be kept as pure and as free of this type of complexity as possible.

I’m also not at all opposed to any additional complexities added to the Leela engine as a chess engine competitor, so long those complexities aren’t present in the training of the network... It is the network weights that should be based on a “zero” philosophy, but what happens at match time is a different story.



Robert Clark

unread,
Sep 10, 2018, 2:47:14 PM9/10/18
to LCZero
This makes a lot of sense to me.  Define terms = good.

mat1...@student.lu.se

unread,
Sep 11, 2018, 5:37:16 AM9/11/18
to LCZero
I don't really mind breaking the zero principle in any way it is defined, this is not painting to me and does not have to be elegant. Substituting tablebase results for certain positions in the training does make it supervised, and if that violates the zero principle then so be it.

Just to clarify: I'm not suggesting that we train our network on the tablebases, but simply end the training there. It seems quite likely that the network itself would end up with poor endgame play, having only experienced the opening and middlegame, but that doesn't bother me, as it is not meant to be used in the endgame. Outside of training, when a tablebase position is reached, the network should become inactive and play pursue from the base only.

This engine I'm describing could be seen as stemming from a mutual symbiotic relationship between the tablebases and the network. Each of these are able to play chess on their own, but they have separate domains of play, and would benefit from a cooperation, much like many different creatures in nature.

Maybe in the future, if say quantum computing actually takes off, we might be able to create a 16 piece tablebase, and the use of networks for chess becomes pointless.
But for now this is not possible, and our network is to serve as an approximation to this missing part of the tablebases. The larger our tablebase becomes, the better our network (using this method) should approximate the missing part. Imagine having a 15 piece tablebase, with this, the network would only need to learn how to make a specific capture such that it ends up in a winning 15 piece tablebase position, this does not seem like a particularly hard task for it to perfect.

pw31

unread,
Sep 12, 2018, 6:29:35 AM9/12/18
to LCZero
There is one huge drawback: Every contributor to the training would need to download 7-man-tablebase, that's 18 TByte!
See https://syzygy-tables.info/.  Not everybody would have such resources.  I seems to me that this is entirely
impractical, it would repel most people.

Greg Mattson

unread,
Sep 12, 2018, 6:58:24 PM9/12/18
to LCZero
pwa -

If I was doing this, I'd basically integrate a tablebase generator, and then have individual testers/users of leela generate a full 5-man tablebase before using leela. that way, no extra requirements.

That being said, if a tablebase was used, I'd argue to go whole hog with it - namely train with it so that leela is primed to convert middlegame positions to:

1. best case, forced wins
2. second best case forced draws
3. third best case losses 

where a forced win with a lower dtm is favored over one with a higher dtm, a draw with lots of paths to victory is favored over one with just a few, and a forced loss is better with a higher dtm than a lower one.

in other words, turn the tablebase into a trainable metric, and then train with it.

And when the tablebase is actually reached when training, continue to train, using the tablebase to give direct feedback back to leela on errors (ie: allow leela to train against the logically best moves and use those as feedback)

I hate to say it, but she is getting (relatively) slaughtered in cccc because of how she is dealing with TBs. she'd have won against komodo, and a couple other engines.

brendan2868

unread,
Sep 13, 2018, 4:41:50 AM9/13/18
to LCZero
I don't think this would work at all since first of all you would need two different neural networks, one that plays to table base (7 pieces) and another the plays from that point onwards to checkmate, draw or resignation etc. There is also a problem with the first network that is it is only playing to a table base that is possible (not to win, draw or lose). It doesn't know if this table base is optimal to win or draw since its only trained to reach table base and nothing else. So my opinion on this is that it seems useless.

J Weston

unread,
Sep 13, 2018, 7:50:34 AM9/13/18
to LCZero
I'd prefer to revisit these issues after another year of self-training.   As one who stayed up many nights with the Lee Sedol live matches, the wider community support for this training stimulated by the LC0 participation in TCEC and CCCC competitions is a notable good.   Also a major distraction without clear long-term objectives and perhaps a researcher's healthy disinterest in the "World".

"I can't wait to see the kind of chess LC0 plays in a year" as a sign of my enthusiasm for this project, but rather, I should write "I can wait ..." instead.
Reply all
Reply to author
Forward
0 new messages