Glossary for understanding issues, discussions, and decisions in lc0

DBg

unread,

Feb 24, 2020, 11:45:26 AM2/24/20

to LCZero

value head, policy head.

head=?

google brings back to lc0 posts when used with value or policy while Wikipedia, for "head" has only code position definition in disambiguation.

this may be basic, and since it used often, i thought that would be the first in my words to explain task list, lc0 glossary.

Also, other terminology that has no definitions easily found for newbies, or not so newbies that may never dared to ask?

I want to focus on short-hands first, so that we all get fluent. Terms that can't be googled right away, and seem to be about ease of communication without getting weighed down by details (there are necessary).

please post keywords here, so that I could make a list and find simplest explanations or pointers (if that is not possible).

curious to see how knowledgeable everyone is.

Dietrich Kappe

unread,

Feb 24, 2020, 1:35:34 PM2/24/20

to LCZero

Value is the predicted game outcome (most recently a WLD output) and policy is the likelihood of a legal move being picked. In neural net architectural terms you have your input layer, then a common shared mass of the network — a ResNet, and two additional independent sets of layers that attach to the outputs of the resnet. These independent sets of layers are the “heads.” DeepMind started off with two independent nets, one for value, the other for policy. They decided to combine the two and discovered that not only was it more efficient than two nets, it performed much better.

There’s now an experimental third “moves left“ head which shows some promise. https://github.com/LeelaChessZero/lc0/pull/961

Dariouch Babaï

unread,

Feb 25, 2020, 8:47:45 AM2/25/20

to lcz...@googlegroups.com

D. Kappe,

Could you tell me if the following is plausible and compatible with the AC0 story? about using 2 nets for value and policy, assuming both of CNN type (ResNet)?

My hypothesis: The same transformation in one ResNet is better suited than two possibly divergent state space representations that the two ResNets would have implied (initial conditions, or other numerical aspect). Unless action preference (policy) estimation would require a fundamentally different representation than the value estimation process, some of the learning task may have ended up reconciling or working to fix the possible divergence. Even if non-divergent, that's doubling the parameter dimensions without increased power of expression, I may be neglected something about the second neural net training, though.

I learned more than just terminology with your answer. thanks.

It also fixed my difficulty about how policy data was stored? Everything is stored in neural nets weights, right, 2 heads?

Le 24/02/2020 à 13:35, Dietrich Kappe a écrit :

Dariouch Babaï

unread,

Feb 25, 2020, 8:47:45 AM2/25/20

to lcz...@googlegroups.com

There’s now an experimental third “moves left“ head which shows some promise. https://github.com/LeelaChessZero/lc0/pull/961

This is very interesting. I wonder how the third head is glued to the other heads (in decision making...). I bet that I will understand better after having read the well developed last synthetic comment. If not, maybe I should ask in another thread (or wait for my brain to catch up).

https://github.com/LeelaChessZero/lc0/pull/961#issuecomment-587112109

Any other lc0 terminology question, does not have to be central, any sand in the cogs can prevent clear communication?

Le 24/02/2020 à 13:35, Dietrich Kappe a écrit :

Dariouch Babaï

unread,

Feb 25, 2020, 8:47:46 AM2/25/20

to lcz...@googlegroups.com

Thank you for this quick and well formed explanation. I may steal it.

I want to make sure i don't miss anything, by paraphrasing some:
Heads is the term used for layers, not whole networks, as some reminder that they fulfill the same work as previously 2 nets for parameter estimation one for value side the other for policy (action preference), two sets of parameters, hallmark of Reinforcement Learning algorithms such as the one used in LC0 (and AC0).

The two heads are two layers, each independently fully connected to the last layer of the particular convolution neural net architecture called ResNet, which handles the representation transformation from initial state space definition, at the input layer.

Le 24/02/2020 à 13:35, Dietrich Kappe a écrit :

Álvaro Begué

unread,

Feb 25, 2020, 9:57:26 AM2/25/20

to LCZero

On Tuesday, February 25, 2020 at 8:47:45 AM UTC-5, DBg wrote:

There’s now an experimental third “moves left“ head which shows some promise. https://github.com/LeelaChessZero/lc0/pull/961
This is very interesting. I wonder how the third head is glued to the other heads (in decision making...). I bet that I will understand better after having read the well developed last synthetic comment. If not, maybe I should ask in another thread (or wait for my brain to catch up).

I don't know what the lc0 developers have in mind, but an estimate of moves left can be used to scale the value. So if the value is between -1 and 1, you multiply it by something like (1 + 1 / estimated_moves_left) and this will encourage shorter wins.

DBg

unread,

Feb 26, 2020, 6:02:53 PM2/26/20

to LCZero

Idea: Focusing on this thread title:
Glossary toward understanding discussions as exemplified in this PR request about "moves left", now plies left:

First getting the following out of the way:

I still think there could be a digest/expand thread here out that successful PR request (as in being seriously worked on by the code makers, the devs) about such additional "heads" (as keyword) and how modular sub-networks can be developed in relation to chess behavior concerns, and then hooked or glued to existing training/testing framework (not coding, but one step up, before coding, algorithm, optimization and ML), which, in my view should be touched even by the wider lc0 community, even those members like me more comfortable or suited to longer paced discussions with the capacity to make threads based on topics (and hierarchies of topics, emerging maybe), which the proven forum structure provides. As an echo, not replacing GitHub practical and to the point discussions, not replacing the chat exchanges in Discord as the decision making and coordinating place.

Getting there:
I thought here would be a place for topic split, so that I can keep focusing on glossary and this thread not diverge into content discussion proposed above (complementing PR). So, how to do both? a new thread, but maybe unburden that one from the glossary tuning work, and let it be more about concepts and non-devs inclusion into a "higher" level conversation, meaning attempted jargon minimization aiming at common language, not a value judgment (I don't do that, at least I try not to).

However, this nice link suggestion by D.kappe provides for an opportunity to dive in and learn backward for my objective in the title of this thread. This is my immature learn by diving (and annoying?), getting lost, grab some floating hanging thread of fruitful question, lifting up from that to the next such question, and finally getting a compatible working story that I could call understanding. Does not make sense? OK, lets try to make it clear.

Idea:
Well, why don't I take the time to carefully read the PR from link above, starting from...... the end (the PR presentation post I mentioned), and then above and below ongoing discussion, and devs follow up toward implementing that PR (final objective).

With a pulsating objective of making an inventory of terms that get in the way of the higher level picture, or understanding what they are talking about. Actual devs are welcome to drop an eye glance, making sure I don't go astray (good faith expected though, as I bring mine). But, I have nothing to show or lose and just want to learn and understand, and hope my individual viewpoint can help others. So please, non-devs, don't be afraid to show strength through exposed ignorance.

Please, try with me to go read that PR, and make a list of terms that get in the way. and share here, while I start adding my own. And, devs, dev-like, chess first, ML first, and other priorities could exercise their pedagogical skills at completing the emerging small scale keyword glossary I am aiming at. (now I am exhausted, next I go to PR glossary task, maybe tomorow I will have something less meta to write or share).

Message has been deleted

Dariouch Babaï

unread,

Feb 28, 2020, 7:40:54 AM2/28/20

to lcz...@googlegroups.com

(1 + 1 / estimated_moves_left)

or 1 + some monotonic function (estimated_moves_left) to be discussed or already discussed in that pull request (I have not read it, just browsed, noted as to include in the documentation project).

I think that is entering hyper-parameter optimization (to plug-in another central term for machine learning, still valid in Reinforcement Learning flavors of it) or meta-learning, more recent terminology trying to codify such optimization, and gluing of "elementary" learned machines. I refrain from using "feature" as it seems to have been abused or is being used differently by different writers (to my understanding).

Maybe we could start a echo thread here for that PR, digesting it with more general language, common to all walks of life within the community. This current thread being of that flavor for terminology needed to understand the questions being discussed in GitHub or chatted about in Discord (it is a nice exercise too, to make sure one understands something, having to find simpler but not simplerer explanation of it).

Recap: with respect to glossary, not your answer (which would be a thread starter in my opinion).
hyper-parameters
meta-learning
"feature" ???? don't care much for it myself, I would leave it to the vision ML problems or projects, not Chess. But we might be stuck with it as it may come for free with convolution neural nets. Then, please somebody provide for a precise down to the neuron in the layer definition of it, and somebody else provide a different definition or different scope of it, or NOT, and then I would be fine with it, in my future conversations in lc0, having that common definition understood (I may be the only person not happy with it, BTW, survey here).

--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/fk5-rUAeBRk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/270c23c7-fbe0-4a57-a65e-7c770065040f%40googlegroups.com.

Tony Mars Rover

unread,

Feb 28, 2020, 11:58:33 AM2/28/20

to LCZero

Monsieur, Bonnie jour!

DBg

unread,

Feb 28, 2020, 12:34:37 PM2/28/20

to LCZero

protobuf, I have seen this often. This seems a lc0 code base short-hand.

I think GitHub issues is better suited, better edition posibilities, I'll make a repository just for this thread where issues can be glossary keywords to develop. a workshop type, prior to entering documentation workflow. bridging here. This is not set in stone. This glossary is also a pretext to think about concepts that stems from the keywords (hence not an exhaustive glossary, a highly subjective one).

On Friday, February 28, 2020 at 11:58:33 AM UTC-5, Tony Mars Rover wrote:

Monsieur, Bonnie jour!

Álvaro Begué

unread,

Feb 28, 2020, 1:53:38 PM2/28/20

to LCZero

I didn't know about protobuf before I saw it in this project, but I know how to use Google: https://en.wikipedia.org/wiki/Protocol_Buffers

Dietrich Kappe

unread,

Feb 28, 2020, 7:52:41 PM2/28/20

to LCZero

In this case it refers to google’s implementation. https://developers.google.com/protocol-buffers

Dietrich Kappe

unread,

Feb 29, 2020, 3:10:32 AM2/29/20

to LCZero

I’d add that protobuf is useful for compressing the size of the network weights file, but it’s not really core to the function of the engine. Previously the format was gzip’d text files.

Dariouch Babaï

unread,

Mar 1, 2020, 6:32:37 PM3/1/20

to lcz...@googlegroups.com

That is also good to know, when deciphering the text of an issue discussion. frequent but not core. It is also the "pb" in many data filenames, right?

Le 29/02/2020 à 03:10, Dietrich Kappe a écrit :

Message has been deleted

DBg

unread,

Mar 10, 2020, 5:33:46 PM3/10/20

to LCZero

Going back to "feature". Yes features do exist. Many types:

software feature (UCI engine parameter existence about whether to include "plies-left" head in the gearbox)

chess feature (conceivable as a chess player or theorist, such as from this position, one prefers the shorter expected length PGN child)

sub-network feature (modular, plug-able, smallish, sub-network, representing one data-set expected estimable feature, like "plie-left")

deep neural network including convolution neural network feature representation

feature selection (which of the surrounding meanings?).

feature detector (neuron, layer, sub-network?)

visual cortex layer-wise feature detector (edge, angle, forgot the others)

feature map(s)

and for the sub-network module : feature training?

While the "plies left" module interestingly covers a few of those types, not all chess state space "features" that could be captured by the early convolution layers (or other deep NN if it were considered), not all interesting (as in improving expression power of the whole net) representation transformations be pinpointed into "features" of so many types, so may be whole network properties (black box as far as we could tell).

DBg

unread,

Mar 10, 2020, 5:36:07 PM3/10/20

to LCZero

continuing:

so may be features distributed over the whole network parameters (black box) did I mean. not editable.

Tony Mars Rover

unread,

Mar 12, 2020, 9:09:15 AM3/12/20

to LCZero

Lots of features! Lots and lots of features! Never knew there were features in them there features!

Tony Mars Rover

unread,

Mar 12, 2020, 9:35:43 AM3/12/20

to LCZero

Dariouch😂

Hey can you do a small favor and please see if you can help Warren Smith (look in the next post for him). He is requesting a mathematical model with math defined dimensions of a Distilled Tiny Network that Dietrich recommended. Why o why? The Devs and the Test Team are too busy today for sure! This is the kind of task we can take on as a deliverable! Thank you!

Tony Mars Rover

unread,

Mar 13, 2020, 2:54:11 AM3/13/20

to LCZero

https://groups.google.com/forum/#!topic/lczero/oujZCmNvqHE

DBg

unread,

Mar 16, 2020, 4:02:04 PM3/16/20

to LCZero

"Self-Play" : throwing another concept in this arena. restricted to all the leelas training algorithms designs (also possible departures from alphazero's papers).

at my current point of learning, struggling with game-theory assumptions about two players, rationality, strategy, equilibrium definitions (not asking for those here right now yet),

I find myself wondering what does "self-play" mean, and the choreography of weight updating of mirrored trainee (here is me assuming the engine is playing both sides with mirrored weights, but how often mirrored, each move within each pgn of self-play? some batches of fixed weights both sides self-play pgns, then updating both sides? variations on that story, if it is not completely besides the map. Thanks for considering. There is no hurry to answer. But I thought I should put it out there. Maybe I'll get something by next week, my pace.

DBg

unread,

Mar 18, 2020, 4:42:50 PM3/18/20

to LCZero

Another one. central. from this other thread. The term "node" is being used.

https://groups.google.com/d/msg/lczero/wqnsZ4Owga4/Bgr_lva1AgAJ

It is central to any tree description. To any tree search description as well.

If we were to make categories of engines like AB engine and Self-learning engine, e.g. Stockfish versus Leelas (CNN and RL based), one things that stands out from what I gather from numerous comments all over, is that they do not have the same strategies of chess strategy tree exploration given a position. These differences are not minute, and they are reflected in the hardware constraints, where for same hardware, same time constraint per move (or move average per game, or whole game, correct me), AB pruning engine explore many "nodes" while Leelas explore an order of magnitude less.

Is the above paragraph using the term "node" correctly so far? yes, but there is more to it? No? please share your knowledge, or hypothesis of understanding (that also is good, as knowledge can emerge from discussing such hypothesis from different angles, or until somebody gets tired of seeing people wander in concept land and give up by writing some good post).

Austin Emmons

unread,

Mar 18, 2020, 9:09:15 PM3/18/20

to LCZero

While I believe you've been using 'node' correctly, I think it is important to note that the distinction between the AB and MCTS methods of tree search is separate from the distinction between handcrafted and self-learning, and separate from hardcoded evaluation functions and NN evals. See the paper below: an AB engine with piece-value and piece-square tables tuned by self-learning, referenced in the original Deepmind A0 paper.

https://www.sciencedirect.com/science/article/pii/S0020025599000936?via%3Dihub

While I'm sure someone else could explain MCTS in lay terms better than I could, I'll also quote from the Deepmind paper on how it works (attached image). What made A0 such a ground-breaker was that it combined MCTS, convolutional neural network evaluation, and self-learning -- all of which had been tried separately, but couldn't compete with traditional techniques. This combination is also what makes it play so 'slowly' compared to traditional engines, as a CNN/MCTS requires much more computation per node. In fact, we likely have the game of Go to thank for providing impetus for this technique, as Go is much harder to play through 'brute force' (AB) methods than through statistical (MCTS) ones.

To answer your question though, to my knowledge a 'node' is just any position that can be reached through legal moves from a given original position, or 'root node'.

As above, please correct me if I'm wrong.

MCTS.png

Álvaro Begué

unread,

Mar 18, 2020, 9:40:17 PM3/18/20

to LCZero

The term "node" comes from graph theory. A graph is an abstract structure that consists of a set of "vertices" and a set of "edges". An edge is an ordered pair of vertices, which you can think of as an arrow connecting one vertex to another vertex. Sometimes the pairs are not ordered, which means that the edge that connects A to B necessarily connects B to A. The word "node" is a common synonym of "vertex", and sometimes "link" is used instead of "edge". I think mathematicians tend to use "vertex" and computer scientists are more likely to use "node", but this is based on my personal experience only.

From a given game state, the game can evolve in many different ways. These can be modeled by a graph, where the nodes are the game states, and an edge represents a legal move, going from the game state before the move to the game state after the move. Instead of "game state" you can think "position", although technically the game state includes additional information besides the position (whose turn it is, castling rights, the list of previous positions for purposes of detecting repetitions, etc.). This graph, which starts at the current game state and branches out with all possible future moves, is often called the "game tree". A tree is an connected undirected graph that doesn't contain any loops (I didn't define "connected", but it's what you think). The game tree is technically not a tree, because there are different paths that can end up in the exact same state (note that after an irreversible move the history of previous positions doesn't matter, so said history is arguably not part of the state). The graph of potential future states of a game is a "directed acyclic graph". DAGs are ubiquitous in computer science.

The two dominant search paradigms are minimax (sometimes called alpha-beta search, because alpha-beta pruning is an extremely common improvement) and MCTS (MonteCarlo Tree Search). Both types of algorithms will visit some small subgraph of the game tree. When we talk about the number of nodes in a search, we are talking about the size of the subgraph (although sometimes the same node will be visited multiple times and it will be counted multiple times).

Perhaps this description is too detailed, but I'm hoping DBg might find it useful.

On Wednesday, March 18, 2020 at 4:42:50 PM UTC-4, DBg wrote:

Dariouch Babaï

unread,

Mar 18, 2020, 9:57:40 PM3/18/20

to Álvaro Begué, LCZero

Perhaps this description is too detailed, but I'm hoping DBg might find it useful.

Yes both for my understanding and documentation workshop, I may end up going more layman, but good to have more precise level in the background (when cutting some corners). thanks, There is already some choices made in what to highlight from your viewpoint. One layman point of view might hide another level. I may use a story that looks layman to some and a precise explanation to others, who knows, in advance? Taking the time to write your explanation effort is what is appreciated. I hope not to take all the posts into consideration adequately.
in any case, they are already here, in the right place according to me. This applies to previous post as well. I will need some time to read carefully both.

--

You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/fk5-rUAeBRk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/558284cb-a73f-419f-b723-0f98092391e3%40googlegroups.com.

DBg

unread,

Apr 23, 2020, 9:23:26 PM4/23/20

to LCZero

collision events

On Monday, February 24, 2020 at 11:45:26 AM UTC-5, DBg wrote:

value head, policy head.
head=?
google brings back to lc0 posts when used with value or policy while Wikipedia, for "head" has only code position definition in disambiguation.

this may be basic, and since it used often, i thought that would be the first in my words to explain task list, lc0 glossary.

Also, other terminology that has no definitions easily found for newbies, or not so newbies that may never dared to ask?
I want to focus on short-hands first, so that we all get fluent. Terms that can't be googled right away, and seem to be about ease of communication without getting weighed down by details (there are necessary).

please post keywords here, so that I could make a list and find simplest explanations or pointers (if that is not possible).
curious to see how knowledgeable everyone is.

Álvaro Begué

unread,

Apr 23, 2020, 9:59:58 PM4/23/20

to DBg, LCZero

In what context did you see that term? I know about collisions in hash tables, but perhaps this is something else.

--

You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/fk5-rUAeBRk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/e58df257-fafe-4f7b-a790-6305b8b95e00%40googlegroups.com.

Dariouch Babaï

unread,

Apr 24, 2020, 1:19:20 AM4/24/20

to Álvaro Begué, LCZero

It was in an lc0 issue.
It may not be a training or design-of-experiment type of necessary terminology, then. Sorry, I thought that it was intriguing, if ever it was.

[LeelaChessZero/lc0] Pre-release v0.25.0-rc2 - v0.25.0-rc2

https://github.com/LeelaChessZero/lc0/releases/tag/v0.25.0-rc2

Increased upper limit for maximum collision events.

glbchess64

unread,

Apr 24, 2020, 8:13:01 AM4/24/20

to LCZero

In fact this not really MCTS that Leela and Alpha Go Zero used : https://github.com/LeelaChessZero/lc0/wiki/Technical-Explanation-of-Leela-Chess-Zero

Reply all

Reply to author

Forward