> I haven't developed this bot. So, I don't know the infinite details about how it plays but apparently 3-ply cubeful hint is not always the same as 3-ply cubeful roll out.
>
> I don't know if this works both ways either, since I don't try to catch positions where the bot's grandmaster decision is better than 3-ply roll out, which logically shouldn't be possible anyway.
*Assuming* that you are actually doing 3-ply (grandmaster) rollouts, and
comparing them to a 3ply evaluation then I have this to say.
The two are not the same, and generally speaking if you do 3ply rollouts
with enough trials you generally will get better results than a 3ply
evaluation. And depending on the number of trials you do, it can take a
very long time to produce rollout results.
In GNUBG 0-ply is 1 move lookahead, 1-ply is 2 move lookahead.
A normal 3 ply evaluation uses either the neural network evaluator,
and/or the bearoff database to get its values. In simplest terms
(exclude filtering and pruning) a 3 ply evaluation, will look at all the
possible outcomes for the next 4 rolls only and produce choices based on
that.
A rollout is different. A rollout actually forces the bot to play
against itself using certain evaluation level. If you are truly doing a
3ply rollout (4 moves ahead) then the bot will play against itself for a
specific number of trials and/or a certain statistical threshold is met.
Each one of the moves in a trial would (in your case) look 4 moves
ahead. Each trial plays from the starting position to game conclusion
(double/drop, complete bearoff, resign etc). It keeps track of the
number of wins, gammons, and backgammons and produces output based on
the results of each move rolled out.
Since a rollout attempts to play actual games, and add up the results,
it can lead to different results than a normal evaluation. When the
difference is significant, it is an indication that the neural networks
understanding of a position is lacking.
If a n-ply rollout and an n-ply evaluation were actually the same thing,
there would be no reason for rollouts!
In summary - Rollouts using n-ply evaluations and simple n-ply
evaluations are two different things. The fact that an n-ply evaluation
can differ from a n-ply rollout is not any indication of cheating, it is
an indication that the neural network may not understand the position.
I have oversimplified some of the concepts here for simplicity.