On 11/21/19, Weber Yan <
zhub...@gmail.com> wrote:
> DeepMind published a paper about MuZero, a new approach to learning, which
> they evaluated on several board games and Atari video games:
>
https://arxiv.org/pdf/1911.08265.pdf
>
> From what I understand from a quick browse of the paper, the innovative
> part compared to AlphaZero type of approach is that MuZero doesn't "know"
> the rules in advance, therefore is a more general learning algorithm, which
>
> can be used in more open-ended domains.
>
> They tested it against AlphaZero for go and MuZero won, this is an exact
> quotation:
>
> *"In Go, MuZero slightly exceeded the performance of AlphaZero, despite
> using less computation per node in the search tree (16 residual blocks per
> evaluation in MuZero compared to 20 blocks in AlphaZero)"
--I presume that the part of the neural net which learned, in go,
which moves are legal,
essentially caused the self-creation of a sub-network for detecting
"simple ko," an important go concept. The rules of go are very
simple. It then proceeded to learn go in normal alphazero fashion
from then on, so essentially from then on, it was alphazero plus an
added ko-detector. So it then is not surprising it would outperform
alphazero without a ko detector.
But more generally in go, it is illegal to repeat prior positions.
This can occur in more ways than "simple ko" (call these "complex ko")
but in practical games complex ko arises much less often than simple
ko. I doubt the muzero system was able to comprehend the full rules
of go including all possible complex ko situations. I.e. it probably
never fully learned the rules of the game. This presumably did not
appreciably hurt its strength because (1) complex ko arises rarely and
(2) they artificially prevented muzero from playing illegal moves.
This "mu > alpha" phenomenon was not seen in any game other than go
(in particular not in chess or shogi). In chess, it presumably would
need to learn a pinned-piece detector and check-detector, which might
be of some use (if they are playing full-legal-move chess rather than
pseudo-legal -- which they did not tell us); learning to detect
promotion, en passant and castling would not seem useful because plain
alphazero already had info about that encoded in its board
representation.
I think the more-useful lesson to learn from this is:
if you manually added to leela's chessboard representation, detectors
of certain useful chess concepts and features, and then leela's neural
net used that "enhanced chessboard" as its inputs, not merely a "bare"
board, then I would expect an increase in leela's strength.
--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)