And actually KataGo has improved since that quote repeated by Thomas
Spark, so it now requires considerably LESS than 1.4 GPU years. I.e.
TS understated the truth.
Here would be some ideas for leela-chess that would be somewhat
analogous -- the "Lc1 project" (?):
1. KataGo does NOT use bare board position as input to its neural net,
it uses board position enhanced with bunch of very cheap information
that humans believe relevant to go play. So for leela-chess: add
various cheap features such as material counts to the board rep as
additional net inputs. ("Nonzero" -- but still is "zero" in the
weaker sense KataGo's learning starts from a totally clueless neural
net.)
2. learning to predict more than just game-outcome:
KataGo does not predict only game outcome (1 bit), it predicts
ownership at game-end of every location (361 bits to learn from each
training game). It seems obvious this must
speed up learning tremendously. Leela-chess when it invented "moves
left" predictor went in this direction. But more could be done. E.g.
you could predict which pawns will promote. You could predict which
pieces will still be there at game-end (or, just say, 20 ply ahead).
You could predict on which square checkmate will occur. You could
predict the
TYPE of game-end (checkmate, stalemate, perpetual, 50move, repeat3;
and for the first 3, who did it to who). All of these things would be
predicting a lot more than 1 bit per game. If we predict N bits, then
learning speed presumably about N times greater.
3. KataGo does not merely predict "prob of win." The game-end value
is not just "win/lose"
1 sign-bit. It is something like
sign(GameOutcome)+0.1*arctan(FinalScore). That was not the exact
formula Wu used, but anyhow it regards a win by 73 as worth more than
a win by 5, etc. This completely disagrees with AlphaGo which had
insisted on just 1 bit and insisted people like Wu were dead wrong.
So anyhow for chess, do not regard the final score as merely
{-1,0,+1}; add something saying "winning with more material is worth
more than merely winning." E.g. if I mate your bare king with my 2
queens, that is worth more than if you have lots of material and I
have only a rook (but still mating you). The reason this is a good
idea (if it is) is it helps prevent "slack play" aka "trolling."
Another version of this idea would be regard me mating you on move 20
as worth more than a mate on move 197.
4. KataGo trains a lot from high randomization and handicap gamestarts.
The chess version would be to play a lot of chess960 and chess(960^2)
Fischerrandom and doublerandom chess gamestarts, plus handicap-starts
where some pieces randomly removed, as well as usual start+openings.
The point of this is more learning with
less worry about overfitting and more understanding of how to play more kinds
of positions.
5. KataGo trains a lot using altered-Komi games.
The closest thing we have in chess to "komi" is: "time odds" games.
Add to the net input, info about the remaining clock time for you &
opponent, and do a lot of games using time odds. And you also can
try, when training, if one side gets ahead, then
artificially give the other side time-odds advantage for rest of game.
The amount
of time odds advantage to give should not be too large (i.e. the
ahead-side still needs to
have greater chances) but enough to even up the chances enough so the
training game still provides information. I.e. if I have a winning
position against you, prob(win)=0.99999, then the rest of the game
would normally provide zero info teaching the players how to play from
then onward. That is idiotic waste of time. But if I have an 0.99999
position against you and you have 5 times more time on your clock,
that is a different matter. You now have chances to trick me, and
will learn about how to do that, while I will learn how to stop you
doing it.
6. KataGo plays on many different board-sizes, not just 19x19.
I doubt there is a worthwhile chess analogue since 8x8 fits so well
with 64-bit computers, and so if you made variable board you'd lose a
lot of juice?
=======================
Stockfish now uses 2 or 3 evaluators. The slowest+smartest is NNUE.
There are also faster+dumber evals. It uses the fast ones if one side
is ahead by enough.
Obvious analogue: Make leela have several neural nets: large+smart and
small+fast.
Use the fast one if it thinks 1 side is enough ahead. Related idea
is: nets that only
are used in, e.g. the endgame.
=======================
Finally, the following is not in KataGo or anything else, but I
personally believe it would be a very very very valuable thing to
have:
Make neural net learn to predict its ERROR.
That is, suppose neural net evaluates position as X.
But 10 ply later, the evaluation is Y. The error was E=|X-Y|.
Add a new output to the NN that predicts E. Or several outputs
predicting a probability distribution of E, e.g. bits predicting that
0<E<1, 1<E<2, 2<E<4, 4<E<8, etc.
Not only would this be very very useful to know for many reasons --
even if this info were
just thrown in the garbage it STILL would be useful to do this because
it would boost the learning rate (see above about learning N bits not
1, boosts learning rate by factor N).