Hi everyone,
I just joined the group and was recently wondering about the philosophy behind Lc0 development. It is crystal clear that 0 implies that the engine must learn from 0, but regarding the algorithm, I was not sure. Can the algorithm be changed as new algorithms are published by researchers or must the project stick to AZ's guidelines?
I am not saying that there are strictly established better algorithms than AZ, but I still have a particular paper in mind that came out earlier this year. It makes some interesting analysis of AZ-like algorithms (and in particular of MuZero).
The article is:
Monte-Carlo tree search as regularized policy optimization, Grill et al., 2020
It basically draws the analysis that the objective pursued by AZ-like algorithms is very close to a particular policy optimization objective, and improves on MuZero by getting rid of some unnecessary approximations (if I remember correctly). Another interesting point is that it gets rid of some hyperparameters.
My question can then be rephrased: would it be in line with Lc0 project's philosophy to try this kind of new algorithms as they come out, or should it be handled in other projects?
With best regards,
Pierre-Louis