Recently, most of the posts about comparing bots have concentrated on
money play, running long sessions to estimate one bots expectation
against the other. But if you want a definite result, there is a
problem with this, due to very large cubes. As is fairly well known,
the expectation may not even be defined, and, even more likely, the
variance is probably infinite.
This may sound like a purely theoretical problem, but it could well be
a practical one. For example, (and I don't know the details about
this), supposedly an earlier version of Jellyfish misunderstood some
backgame positions so badly that it would double from behind. In any
case, it's easy to imagine that there are very rare positions where
one of the bots doubles from behind, in a situation which is more or
less stable for several moves. If the other bot redoubles in this kind
of position, very rare large cubes can result. (I've known a human to
double when closed out, with his other 14 men on the 1 and 2 points, I
suppose because he was `ahead in the race'; if he'd had the courage
of his convictions, a very large cube could have resulted
while I was bringing my spares round the board.)
For good bots, such a position is obviously unusual, but how unusual,
and how do we know? If you have in total 5000 games between the bots,
say, and this hasn't occurred, then you can say it probably doesn't
happen on average in more than 1 in 1500 games, say:
If in the very long run it occurred 1 in 1500 times, there would
be a 96.4% chance it happened in 5000 games (.964=1-exp(-5000/1500)).
When it does happen, how high does the cube get? It could easily be
high enough to dominate other equity differences between the bots, and
certainly to throw off any variance estimate by a significant amount.
Ok - you might say, we'll just limit the cube, to 16 say, if both bots
have settings to play with limited cubes. Or you could say, we'll let
them play, but estimate `expected result in games without very large
cubes', if you're happy with that instead. But this isn't very
satisfactory: a bot that mostly wins a bit, but very rarely lets you
take 1024 points off it is not a good money player in the long run!
But, when it comes to variance reduction, there is still a problem:
you can't really tell what the variance is, because the rare large
cubes may bump it up, and you don't know how often they happen. E.g.,
a 16 point result every 1500 games adds more than .1 to the variance,
which (I guess from a recent post of Zare) is comparable with the
apparent variance after reduction.
Match play has a big advantage: the results are bounded: calling a
win/loss +1/-1, we know the variance is at most 1. This makes it much
easier (possible) to formulate rigorous statistical tests to show with
a certain confidence that one bot is better than the other.
More details in separate posts.
Oliver Riordan