The proposed algorithm will converge to the same limit, but can
potentially converge much faster. This algorithm assumes
the existance of an evaluation function which approximates
the equity of a given position. The important feature of
this algorithm is that the better the evaluation function is,
the faster the convergence will be, but no matter how inaccurate
the evaluation function is, the convergence limit is unaffected.
For any give position, the "disparity" of that position is computed
as follows. For each possible dice roll, the best move is determined
and the evaluation function is applied to the resulting position.
The disparity is then equal to the weighted average of the evaluations
of all these resulting positions minus the evalution of the
original position. The better the evaluation funcation is,
the smaller the disparity will be.
The roll out proceeds normally, but the result of each trial is
equal to the equity value determined by applying the evalution function
to the original position plus the sum of the disparities of all the
positions that occurred in the roll out (including the original
position).
The proof of correctness is inductive. It is assumed that the
evaluation function correctly evaluates a "game over" position.
The expected value of the roll out result of a position is equal to
the weighted average of the expected values of the roll outs of
each of the next possible positions,
plus the evaluation of the current position, minus the weighted average
of the evaluations of the next postions, plus the disparity of the
current position. This in turn is equal to just the weighted average
of the expected values of the roll outs of the next positions.
These are correct by the induction hypothesis.
I believe this is exactly what Jellyfish does with Fredrik Dahl's
variance reduction method used in Level 6 and interactive rollouts.
Ron
No, since games are rolled out to completion.
JellyFish's "variance reduction" rollout is probably a variation
of this, except that JF has the additional "optimal sampling"
technique. I believe the "optimal sampling" method amounts to
selecting dice rolls whose rollout results deviate from
expectation (according to the static evaluator) by a large
amount. But please note: I have not looked up the techniques
JF uses in statistics books, so my concepts may be wrong.
The idea of Jim Williams's algorithm is to reduce the variance
in the rollout by making it equal to the variance of the
difference between true equity and a good evaluation function.
The gain can be enormous. For example, if we assume that the
deviation between a good neural evaluation and the true equity
of a position averages 0.1, then the variance is reduced by a
0.1 * 0.1, which is a factor of 100!
The downside of the algorithm is that you have to do a 1-ply search
of every position, which involves 21 times as much calculation
as a static evaluation. So the net gain is just a factor of 5
(== gain a factor of 100, pay a factor of 21).
What I know about JF level 6 rollouts is consistent with the
statements above.
Brian
If i understand, you propose that the rollout scan the whole game tree,
that is every possible roll for every successive position. If this is
the case, the algorithm may be theoreticaly correct but impossible to
apply in a reasonable amount of time.
Let's compare the processing required for your scheme vs a traditional
10000 game rollout. Let "n" be the average number of rolls left in the
game and "m" the average number of legal moves for any given roll. If we
assume m=10, the number of positions to evaluate for a complete rollout
is:
Position evaluations to calculate examples n=4
n=10
your algorithm (21)**n * m 1944810
1.6E14
traditional 10000 * n * m 400000
1.0E6
While the traditional method requires about 1 million position
evaluations to complete, your method requires an astronomical number of
position evaluations. For n=10, if the
traditional method takes 1 minute, your method will take 190 years on
the same computer.
Eric Groleau
No. A truncated rollout plays a given number of rolls each time,
and then evaluate the position.
The algorithm Jim describes sounds like a JF level 6 rollout.
The level 6 rollout uses 1 ply lookahead, and the equity variance
is much smaller than for the level 5 rollout. I've never understood
why the variance should be less, but thanks to Jim I now do.
Stig Eide