Well I looked (very fast browsing, very sparse), but it appears that
they actually use individual players data with the maia models, and
different general error statistical distribution family (not sure
that is the gist of the increase match to individual play "style"
they claim).
my naive impression for both the average rating target maia models
and the more personal one, is that they are using error models, not
chess play models to fit.
Anybody gathered how they defined what an error or blunder is.
Does maia have both an evolving best play model and an error model,
or is there some outside referential deemed best.
Is it training an lc0 with an error models on top? in sequence.. or
together..
my apriori skepticism about the model assumptions (if i am not
mistaken, which is possible always, me not reading everything, or
updating my view by others mean than making hypotheses like here,
expeting informative replies in any direction).
is that rating is not really a chess skill characteristic to aim
at. I would refer to machine learning results with many suboptimal
experts having each their own probably complementary "subpsaces" of
expertise... and optimal learning coming from their complementarity
as ensemble immitation learning. not exactly as here.. but it
brings about the questions that average performance rating may be
very mutlidimensional, as much as the game itself... so that
learning an individual signature as blunt error model from some
putative fixed known best play reference, seems like it is going to
blend a lots of information together into noise averaging....
ok that is a stance proposition.. no tomatoes please,... but sound
counter points.. would be nice... am i having wrong impression or
wrong understanding of the modelling process?
however. compared to maia the claim of the previous post link, is
that they improve the "accuracy" of the error model. research is
not total fog. news can come out of there.
"brian.p.r...@gmail.com"
<brian.p.r...@gmail.com>: Mar 16 07:27AM
-0700
Matching moves against a set of players to pick the
most likely player to
have made a move is not the same as playing with a
particular human
player's style.
Training nets to adopt the style of an individual
human player has been
tried several times with Lc0.
The results are "fair" but not "good", IIRC. Moreover,
the nets are quite
weak relative to the top strength nets, but that does
not matter all that
much in this case.
Accordingly, this area remains a topic of research.
|