You need both, because with only the value head you would have a node evaluation but no idea which nodes to expand next.
You need both, because with only the value head you would have a node evaluation but no idea which nodes to expand next. Within knowing which nodes to search (i.e. some degree of node pruning) you won't get any reasonable search depth. That's why the dual head is important, policy head directs the search while value head gives the evaluations.
Still don't understand why we need both. Policy head tells which nodes should be expanded. But aren't they the same as next moves with relatively high winrate (Value head)? Given we have position with winrate 0.5 and there are 5 legal moves from this position that have winrates 0.51, 0.55, 0.65, 0.4, 0.49. Is not just selecting node with winrate 0.65 good pruning policy (without need of policy head)?
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/f2d16110-7e2e-498d-9d22-4b5836a0b9ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
While a Value head is one number, how exactly Policy head is represented in LC0? I assume it is a n-tuple of numbers evaluating each move. Very naive representation would be 64x63-touple (defining start and end position of move, covering all legal moves, but the vast majority of moves defined in this way would be illegal). I expect you use something more sophisticated.
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/621595b3-2484-4971-80cc-e4a7da58070e%40googlegroups.com.
At an expansion node with Policy and Eval, you basically expend two computational time units, one for the Eval and the other for the Policy.
You could, I guess, get a Policy by asking the NN for an evaluation for each child, and then expand on maxeval. But, since move width in chess is around 30 to 40, this would require 30 to 40 time units to get a Policy, and your node rate would fall from, say 80K nodes per second to 2K nodes per second (using Google TPUs). Using a PC plus GPU, well, you get the idea ....