--
Honesty is a very expensive gift. So, don't expect it from cheap people - Warren Buffett
http://tayek.com/
_______________________________________________
Computer-go mailing list
Compu...@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
I couldn't improve leela zero's strength by implementing SEARCH and ACT.
https://github.com/zakki/leela-zero/commits/regularized_policy
2020年7月17日(金) 2:47 Rémi Coulom <remi....@gmail.com>:
--
Kensuke Matsuzaki
> In this section, we establish our main claim namely that AlphaZero’s action selection criteria can be interpreted as approximating the solution to a regularized policy-optimization objective.
I think they say UCT and PUCT is approximation of direct π¯ sampling,
but I haven't understood section 3 well.
2020年7月20日(月) 2:51 Daniel <dsh...@gmail.com>: