Would LC0 learn faster (and gain "deeper chess comprehension") using Chess960?

Kevin Kirkpatrick

unread,

Apr 6, 2018, 2:40:19 PM4/6/18

to LCZero

Chess960 can be thought of as a "more generic" version of chess: a chess game that can begin with the standard chess opening, as well as 959 other variations in starting position (for a deeper explanation of Chess960, see https://en.wikipedia.org/wiki/Chess960 )

My basic idea is to first train LC0 to become proficient at Chess960 (thus developing a "generic" sense of piece and positional value)... and *then* take that proficient-Chess960 variant and train it extensively on the standard chess opening (or, alternatively, train LC0 using Chess960, but with the standard chess positions used for 50% of the game).

I don't mean this as a pure thought experiment, either. A test of the idea would be relatively easy to perform:

1) Create a Chess960 variant,

2) Train it for just 100,000 games or so (50k using Chess960, then 50k where only standard-chess version is "selected"),

3) Compete the Chess960-trained variant, using standard chess positions, against the 100,000-game network of the standard-chess-trained variant.

If the Chess960 variant wins decisively, I think that would be evidence (more testing would be needed, of course, before taking drastic measures!) that the current approach should be scrapped and replaced with Chess960-training.

Dave Whipp

unread,

Apr 6, 2018, 2:58:47 PM4/6/18

to Kevin Kirkpatrick, LCZero

I think it's a mistake is to think that "training on openings" is necessarily useful ... this would tend to fall into the trap of over fitting, where a "strong" player could be flummoxed by a weaker play's "bad" move. 960 is an attempt to fix that for humans; but LZ already started out using random play (it never attempted to memorize small numbers of lines of play), and is forced to make blunders in self-play so that it learns to play against such random moves. It's plausible that something like 960 could induce additional robustness; but I don't think there's evidence of such blind spots in current versions.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/89ee1ae6-9850-48eb-9f2d-9b052d11fbb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevin Kirkpatrick

unread,

Apr 6, 2018, 3:46:32 PM4/6/18

to LCZero

Over-fitting during the "formative learning" period is exactly my concern. How many (legitimate) conceptions of piece value and positional evaluation are rejected early in training *simply* because they lead to cheap traps in standard-position chess?

Hypothetically, say in standard chess opening that there are, completely as an artifact of the arrangements of standard opening, many queen-captures-free-pawn scenarios that lead to a "cheap" trapped position. It seems plausible that LZ, trained solely against this opening arrangement, might develop a deep (and ultimately, unfounded) reluctance for capturing free pawns with queen. With sufficient training, yes, LZ would eventually move from "Using queen to get free pawn is bad" to "Using queen to get free pawn is bad, except when X, Y, Z, etc". But it might be too big a comprehensive leap, irrespective of training levels, to ever get from that to the more accurate conception, "Using queen to get free pawn is good, except when A, B, C, etc...".

In terms of fitness landscape: what if training solely on the standard opening is the metaphorical equivalent of, "based on initial training, western-states of the US are mostly flat; best stick to the Eastern states"... . Yes, the algorithm will do a fantastic job of finding very high Appalachian moutain peaks... but it may wind up never even exploring the Rockies.

jkiliani

unread,

Apr 6, 2018, 3:55:37 PM4/6/18

to LCZero

Dirichlet noise is there to continually explore variations that the network currently doesn't know because they didn't work in the past. In your particular example, once the network is strong enough to effectively avoid getting its queen trapped, it will explore capture the pawns due to Dirichlet noise and try it at least occasionally, due to proportional move selection. If it ends up working, then such moves will be played more often in the future.

Kevin Kirkpatrick

unread,

Apr 6, 2018, 4:18:03 PM4/6/18

to LCZero

Per my landscape analogy, what if we initially wind up in the Appalachians, and the range of random "Dirichlet noise" jumps is 500 miles? What if we need 4 consecutive "flatland" jumps to get from the Appalachains to the foothills of the Rockies?

Yes - my analogy is 2-dimensional (best "score" based on latitude&longitude), and LZ's network is N-dimensional (best score based on node1 weight, node2 weight, ..., nodeN weight). As such, the risk of getting stuck on local optimimums is smaller than my analogy would suggest. But I still - at a gut level - feel that restricted early-training runs a legit risk of "missing the highest peak of the Rockies while getting stuck in the Appalachians".

Dave Whipp

unread,

Apr 6, 2018, 4:57:21 PM4/6/18

to Kevin Kirkpatrick, LCZero

I think the risk of getting stuck in an early opening is somewhat mitigated by the fact that, even when it finds such a narrow path, it will escape by failing to see a mate-in-two (in self-play, "it" is both the player and the opponent). So early sequences will not become too strongly associated with winning and losing

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/15cdb704-3497-4686-9472-ce16cfc4be4f%40googlegroups.com.

asterix aster

unread,

Apr 7, 2018, 5:00:07 PM4/7/18

to LCZero

This concept is certainly worth exploring as a side project and it might be possible to dedicate maybe a 5-10% (or some other reasonable number) of computing resources to this side project for a short time. But someone has to sweat it out to write a code for this. If that is possible, then it would interesting to see if this has any impact on the playing style of leela chess zero.

M MUSTERMANN

unread,

Apr 7, 2018, 6:33:01 PM4/7/18

to LCZero

Kevin Kirkpatrick:

To use chess960 is a great idea.

To use 50% for the standard chess opening is completely useless in the long run, due to different reasons.

I fact you doesn't need to do more than training Zero with chess960.

It's not only to learn how to play the openings but much more important is how to play the whole game.

Zero would also handle different kinds of extrem positions in a much better way, after the opening phase, which it can learn much more often in chess960.

Kevin Kirkpatrick

unread,

Apr 8, 2018, 11:17:07 PM4/8/18

to LCZero

Completely agree - everything I've learned about evolving neural networks has buttressed the idea that "more varied landscape = faster, deeper, and more flexible learning".

By [crude] analogy: if you were going to train an autonomous truck, with the sole requirement that it successfully transport goods between two fixed points over the same route, would it be better to train it exclusively on that route, or to train it on a wide variety of roads? With the "restricted training" approach, the truck is going to struggle mightily to learn simple fundamentals. For instance, if there's only one stoplight on the route, imagine how challenging it will be for the truck to pick up on the meaning of red/yellow/green. There are also going to be false-positive artifacts: perhaps on the route in question, 9 of the 10 bridges it goes under are followed by a curve in the road... and the truck gains a deeply-engrained association which leads to erratic slowdowns whenever it goes under the 10th bridge. Even if tested on the main route, I have no doubt the varied-routes-trained truck would greatly outperform the main-route-trained truck in virtually every way.

I suspect that the same kinds of things are hindering LC0's learning by restricting its training to standard chess openings. There are probably simple fundamentals that it struggles to pick up on, simply because the occasions to learn them are less common (and the contexts in which they occur more homogeneous) in standard chess. Perhaps the current weakness of LC0's tactical play stems from such limitations. There are probably also false/misleading patterns that LC0 picked up early and must now work extensively to unlearn (and may never be "completely/cleanly unlearned", from a NN perspective). It seems to me that the only way LC0 could improve in this landscape is by gaining a deep, non-deviating sense of the one goal: trap and kill the opponent's king. Any strategy that works by "luck" (e.g. trap-sacrifices that give an advantage merely as an artifact of a particular starting position) would be utterly useless in the chess960 landscape.

I kind of wonder if might be worth considering taking this idea a little further... If it's plausible that the chess960-shuffle could improve training, might it help to also throw in other random variations? Such as:

* Continue to use the 6 "standard pieces", but add a handful of other "make believe" pieces. Standard knights move in a 1x2 pattern: let's add a couple other knight-like pieces which move in 2x2 and 1x3 patterns. Perhaps add a weakened rook-like piece restricted to only move 2 squares at a time, or a super-powerful queen-like piece (that can move like normal queens but also move like knights).

* Different board dimensions... in addition to the 8x8 grid, randomly start games with randomly-selected alternatives: 7x8, 7x9, 9x9, and 10x6 grids.

* Other minor deviations in rules?

I would expect these variations to really help LC0 determine a piece's value not only in terms of what it can do, but also in terms of its surrounding context.

Don't get me wrong: there's obviously a line to be drawn between "more randomness" and utter chaos. If the random variations deviate too far from the standard game, I suspect some deeper strategies of standard chess may prove too elusive the generalized training to learn. For instance, if king can start off-center, will the random-variation LC0 be able to fully appreciate the "deep pattern" value of controlling the center early on in standard chess? So I definitely feel it'd be best to retain certain fundamentals: things like "king starts in center of back row"; castling rules; major/minor pieces on back row behind a row of pawns, and so on.

Anyway - if this line of thinking should gain traction, I think it'd make more sense to research this "more random" approach than just testing chess960 (following up the name "chess960", perhaps it could be called "Chess1million").

Kevin Kirkpatrick

unread,

Apr 8, 2018, 11:26:46 PM4/8/18

to LCZero

FWIW - back in my college days, I'd have drooled at the idea of taking something like this on for my CompSci "Senior Project" credits... alas, "family and career" now take far too much sweat out of me to have enough for such an endeavor (though if it did take off - I'd feel strongly compelled to pitch in as much as I could).

Joules Kin

unread,

Apr 8, 2018, 11:37:41 PM4/8/18

to LCZero

While I agree that chess960 might help, I don't see any point in training leela on non-standard pieces as leela isn't going to use them on actual matches. It's kinda like teaching quaternions and octonions to help a student take his complex number exam in high school. There's more risk than reward and it's usually not worth it.

Kevin Kirkpatrick

unread,

Apr 9, 2018, 12:53:31 AM4/9/18

to LCZero

In hindsight, the idea sounded much more compelling as I first thought through it. It might make for an interesting academic exercise. But pragmatically... yeah, it's probably a dud. Beyond risk and reward, there's also complexity. I suspect the hurdle to implementing "chess1million" is substantially higher than chess960. Chess1million would probably require a rewrite of major portions of the game-control code; chess960 is just inserting a quick "scramble" step against the starting position immediately before each match.

Reply all

Reply to author

Forward