Leela has a PROBLEM with weakness in endgames.
One idea to overcome that was "ender",
a new neural net artificially trained purely on endgame positions
(extracted from human games, apparently, but then leela plays out the endgame from then on with
assistance from tablebases?)
which would be used with <=15 men on the board -- an arbitrarily-human-chosen threshhold based
on a human-chosen feature (# men) -- otherwise use the main net.
That would be a bit annoying since it in several ways disobeys the "zero" philosophy.
But a different idea, which would not need any human games as input, is
simply to "boost" the training of the main net for endgame positions.
The underlying argument (which can be made, and plausibly even is true)
is that endgames matter MORE than middlegames and openings in the sense that if leela is
no good at endgames, then it cannot learn middlegames
and openings well either, since all knowledge during learning flows backward from game-ends.
That is ok if leela is only about learning how to checkmate in the middlegame and never reach
an endgame, but against strong opponents that often is not possible and then you instead
need to play the middlegame aiming to reach a winning endgame, and you need to
know which endgames win and how to win them.
So what do I mean by "boost"?
Actually I am not sure whether "boosting" is the correct technical term for my proposal.
But anyhow here is the idea.
Either (a) increase the percentage of endgame positions that arise during training, or
(b) increase the learning rate in a manner that is game-stage dependent (faster learn rate in
Here (b) seems again not "zero" since a human is setting up the notion of what "game stage" means
(e.g. # men remaining on board).
But (a) could still happily obey the "zero" philosophy.
A way to accomplish (a) is to "fork" training games into two games (more forking then is possible
for a daughter branch and so on...) with some small chance P, e.g. P=0.03, each move you play. That is,
in a normal game, you at your turn play one move, the one you think best. But if we decide to fork, you
play 2 moves, e.g. the 2 you think best, and continue on from there playing two games from then on.
In this way, games that last longer yield a greater number of training positions, and those positions
are naturally enriched in "endgame" positions. A game with forking really is a tree rather than a linear progression.
(By the way, any training position that happens to lie inside a tablebase should use tablebase data to aid the train.
Which in no way violates "zero" philosophy.)
You have to decide which positions will actually be used to train the net, e.g. if it were 1 per game, it
would need to become 1 per fork-branch... perhaps demand it occur at least 5 ply after the forking?
There pretty much is only 1 hyperparameter going on here, that is P. You want to choose P
to yield the best rate of elo increase with months of training time. To prevent occasional exponential explosions
causing damage you might want to insist that, e.g, forking is turned off in any game branch if 7 forks already occurred.
Leela's present training method involves P=0, which I claim was totally arbitrary.
I merely am pointing out that it is plausible that the best value of P might actually not be 0,
but rather something positive.
This would be a very easy change to make, and you might find with the optimal value of P that you get
considerably better learning than with P=0.