GnuBG: Fractional-ply evaluators

Tom Keith

unread,

Oct 17, 2003, 8:28:55 PM10/17/03

to

Let me describe an experiment I did comparing zero-ply and one-ply
evaluations in GnuBG, and follow it up with a request for the GnuBG
developers (as if they don't have enough on their plate already).

---

When GnuBG evaluates a position, you can tell it how far ahead you
want it to look. A zero-ply evaluation does no lookahead -- you just
get the output of the program's neural net. A one-ply evaluation
looks ahead one roll: it looks at all 21 possible rolls, makes what
it believes the best play for each, and takes a weighted average of
the resulting positions. Each additional ply of lookahead takes about
21 times as long as the previous level.

When you do rollouts in GnuBG, one of the parameters you can set is
what level of evaluation to use for checker plays. Presumably one-ply
evaluation plays better than zero-ply, and two-ply plays better than
one-ply, etc. However, there has been some discussion over the years
about whether odd-ply evaluations are as reliable as even-ply. (See
http://www.bkgm.com/rgb/rgb.cgi?view+1061).

I thought I'd try an experiment comparing zero-ply and one-ply
evaluations. Here's what I did:

1. I collected a large number backgammon games between good players,
some human-vs-human, some human-vs-computer. From these I took
a representative sample of positions. (However duplicate positions
were deleted so early game positions are under-represented.)

2. I rolled out each position to the end of the game thirty-six times
using cubelss zero-ply evaluation. Variance reduction was applied.

3. I took the root-mean-square average of the differences between
GnuBG's zero-ply evaluation and the rollout results, and between
GnuBG's one-play evaluation and the rollout results. I looked
only at game-winning chances; I didn't look at gammons or
backgammons.

These are the results:

Zero-ply evaluation: Average error = 0.0300
One-ply evaluation: Average error = 0.0284

So one-ply evaluation does do better on average. This is to be
expected; being able to look ahead one ply should be a help,
especially in volatile positions.

In certain games GnuBG's evaluation seems to oscillate back and forth
according to which side's turn it is to play. When this happens, a
one-ply evaluation (which essentially looks at the game from the other
player's side) can give quite different numbers than a zero-ply
evaluation. You might expect when zero-ply and one-ply evaluations
differ by a lot that the true value of the position is probably
somewhere in between. I thought it would be interesting to see what
would happen if you had an evaluator that used the average of zero-ply
and one-ply. I called this a "0.5-ply evaluation."

0.5-ply evaluation: Average error = 0.0245

So 0.5-ply does do better! In fact, it does enough better to make you
wonder if it does even better than two-ply. (I didn't look into this.)

Can we do even better? Something I noticed is that you can often
predict whether zero-ply or one-ply is better for a particular
position by looking at the relative pipcount. (The relative pipcount
is your own pipcount minus your opponent's pipcount.) When the
relative pipcount is between -160 and -40, one-ply usually does
better; when the relative pipcount is between 40 and 150, zero-ply
usually does better. Let's call an evaluator based on this idea a
"hybrid evaluator." How well does the hybrid evaluator perform?

Hybrid evaluator: Average error = 0.0225

It should be noted that these tests show how well GnuBG performs at
computing the ABSOLUTE equity of a position. They may or may not
indicate an improvement in GnuBG's ability to *play* a position, since
playing depends on having accurate RELATIVE equities. Nevertheless,
I'm guessing that the 0.5-ply and hybrid evaluators play better than
the integer-ply evaluators too.

---

Now my request for the GnuBG people:

- Would it be possible to build a fractional-ply evaluator into
GnuBG that could evaluate positions at 0.5-ply, 1.5-ply, etc.?

- Would it be possible to build a hybrid evalator into GnuBG like the
one described above that would use a combination of zero-ply,
one-ply, and 0.5-ply evaluation depending on the relative pipcount?

At least for cube decisions, they should be an improvement over
zero-ply or one-ply evaluations. And maybe checker-play will be
better too, making them better for rollouts.

Thanks for reading this,

Tom Keith

unread,

Oct 19, 2003, 1:01:29 PM10/19/03

to

To follow up on my own post ...

It has been pointed out by some (most notably Robert-Jan Veldhuizen),
that GnuBG seems to handle cube decisions better at zero-ply than at
one-ply.

To test this, I selected positions from actual games in which the
player-on-roll's game-winning chances + gammons/2 was between 70% and
80%. In other words, I was trying to select positions in which a player
might be thinking about doubling, or his opponent might have to think
about whether to take or drop.

Comparing 0-ply evaluations with untruncated rollouts gave a standard
error of 0.0235.
Comparing 1-ply evaluations with untruncated rollouts gave a standard
error of 0.0288.

So 0-ply does do significantly better in cube-likely positions. This
despite the fact that 1-ply does better at estimating absolute equity
in general.

Comparing the hybrid evaluator described in my previous post with the
untruncated rollouts gave a standard error of 0.0207. So the hybrid
evaluator still does better than zero-ply, even on positions that
zero-ply seems to be particularly good at.

Tom Keith
Backgammon Galore!
http://www.bkgm.com

t...@bkgm.com (Tom Keith) wrote in message news:<b650e130.03101...@posting.google.com>...

Peter Schneider

unread,

Oct 19, 2003, 2:52:15 PM10/19/03

to

Hi,

[...]

> When you do rollouts in GnuBG, one of the parameters you can set is
> what level of evaluation to use for checker plays. Presumably
one-ply
> evaluation plays better than zero-ply, and two-ply plays better than
> one-ply, etc. However, there has been some discussion over the
years
> about whether odd-ply evaluations are as reliable as even-ply. (See
> http://www.bkgm.com/rgb/rgb.cgi?view+1061).

I always wondered a lot about these discussions. This behaviour
(comparably big differences between n-ply and n+1-ply equity) simply
shouldn't happen (and I think that Snowie 3 doesn not expose it). It's
a sign that something fundamental is broken, either the neural net (or
its weights) or something else, possibly even trivial like confusing
the sign of a static evaluation under certain conditions when
propagating it back to the "calling" ply.

If a programming error can be excluded with sufficient probability,
then the weights and the nets are to be checked; if one ply uses a
different net than the other one, and they differ a lot, it's an
indication that at least one of them is flawed for that position. It's
easy to figure out which one needs to be fixed.

> You might expect when zero-ply and one-ply evaluations
> differ by a lot that the true value of the position
> is probably somewhere in between.

I don't like this approach too much. Instead of making a workaround
for a flawed 1-ply evaluation, I think one should address the cause.
Imo it doesn't make sense to waste the time of e.g. a 1-ply evaluation
for an actually worse result than 0-ply yields, even if you can
mitigate the bad effect by using the average with the 0-ply ;-).

Am I too harsh? Perhaps the developers can shed some light onto the
reasons for the abovementioned odd/even-ply differences.

Regards,
Peter aka the juggler

Peter Schneider

unread,

Oct 19, 2003, 3:09:53 PM10/19/03

to

Hi again,

second thought: Ok, maybe that a position looks very different for the
two nets involved from either point of view, and that the truth lies
somewhere in the middle. Still I think that the net/weight/code for
the n+1-ply must be extraordinarily bad if it can not make any
significant use of the huge advantage of looking ahead a complete ply,
and this inferior quality should be addressed.

Jim Segrave

unread,

Oct 20, 2003, 2:43:36 AM10/20/03

to

In article <bmuno5$qtk2c$1...@news.hansenet.net>,

There is no 'one or odd ply' net. There are different nets for certain
types of positions - rrace, contact, bear-off, and one side
crashed. A one ply evaluation could, in some circumstances, evaluate
the current (0 ply) position with one net and then find the followup
move uses a different net - for example a contact postion where you
are looking at a move which breaks contact and converts the game to a
race.

While it's possible that there's an undiscovred bug that leads to
mishandling of the one-ply evaluations, I think that at this point
it's probably unlikely. A lot of people have spent a long time looking
at this without finding anything like this.

I would expect Snowie or any other bot to have some positions where
the odd and even ply evaluations oscilate - positions which require
looking ahead one or more rolls to see upcoming problems will be
liable to giving static evaluators problems like this.

It is possible that different training of the neural nets can reduce
the sensitivity to such situations, but that can in turn lead to
weakening other evaluations - a neural net, by its very nature, is
going to be a compromise solution which attempts to map a huge
collection of possible positions into a function which yields the
probabilities of win/win g/win bg/lose g/ lose bg. Gnubg's training
has given a net which is very strong on 0 and even ply evaluations but
is apparently more prone to make evaluation errors in some one-ply
situations.

--
Jim Segrave j...@jes-2.demon.nl

Peter Schneider

unread,

Oct 20, 2003, 8:03:22 AM10/20/03

to

Hi,

"Jim Segrave" <j...@allium-ursunum.demon.nl> wrote
> Peter Schneider <schneiderp...@gmx.net> wrote:

> >second thought: Ok, maybe that a position looks very
> >different for the two nets involved from either point
> >of view, and that the truth lies
> >somewhere in the middle. Still I think that the net/weight/code for
> >the n+1-ply must be extraordinarily bad if it can not make any
> >significant use of the huge advantage of looking ahead a complete
ply,
> >and this inferior quality should be addressed.
>
> There is no 'one or odd ply' net. There are different nets for
certain
> types of positions - rrace, contact, bear-off, and one side
> crashed. A one ply evaluation could, in some circumstances, evaluate
> the current (0 ply) position with one net and then find the followup
> move uses a different net - for example a contact postion where you
> are looking at a move which breaks contact and converts the game to
a
> race.

Yes, that was more or less what I meant. Sorry for not being clear.
Actually, I wonder whether the same or strategically similar positions
(and that would often apply to most of the positions resulting from
the 1-ply position generator) could be evaluated by different nets,
depending on whose turn it is? (In the case of asymmetrical positions,
e.g. backgames.) If so, I would consider this condition the most
suspect one.

[...]

> Gnubg's training
> has given a net which is very strong on 0 and even ply evaluations
but
> is apparently more prone to make evaluation errors in some one-ply
> situations.

Well, exactly this sentence does not make any sense to me. n+1 should
on average be much stronger for any n. If there is need to advise
people "better do not make 1-ply evaluations, they are too unreliable,
better stick with 0-ply or 2-ply", something is wrong.

Nis Jorgensen

unread,

Oct 20, 2003, 7:54:20 PM10/20/03

to bug-...@gnu.org

On 17 Oct 2003 17:28:55 -0700
t...@bkgm.com (Tom Keith) wrote in rec.games.backgammon :

> Let me describe an experiment I did comparing zero-ply and one-ply
> evaluations in GnuBG, and follow it up with a request for the GnuBG
> developers (as if they don't have enough on their plate already).

Reply sent to rec.games.backgammon, cc'ed to bug-...@gnu.org. If
possible, reply to both.

> ---
>
> When GnuBG evaluates a position, you can tell it how far ahead you
> want it to look. A zero-ply evaluation does no lookahead -- you just
> get the output of the program's neural net. A one-ply evaluation
> looks ahead one roll: it looks at all 21 possible rolls, makes what
> it believes the best play for each, and takes a weighted average of
> the resulting positions. Each additional ply of lookahead takes about
> 21 times as long as the previous level.
>
> When you do rollouts in GnuBG, one of the parameters you can set is
> what level of evaluation to use for checker plays. Presumably one-ply
> evaluation plays better than zero-ply, and two-ply plays better than
> one-ply, etc. However, there has been some discussion over the years
> about whether odd-ply evaluations are as reliable as even-ply. (See
> http://www.bkgm.com/rgb/rgb.cgi?view+1061).

A little while ago, I suggested on the bug-gnubg list an explanation of
why 1-ply is off the mark. Here
is the important part:

Nis wrote:

> If 0-ply is unbiased but imprecise (as in having average error 0) then
> the value of the best move will be overrated. Example
>
> Move True Equity 0-ply equity
> A 0.4 0.35
> B 0.4 0.45
> C 0.5 0.45
> D 0.5 0.55 (*BEST MOVE*)
>
> Note that the average error is 0, but the best move is off by 0.05.
>
> The result of this should be that 1-ply, which is the average of 21
> BestMoves for the opponent, is underrated by some amount. This will be
> added to the (negated) 0-ply, so if 0-ply is overrated, 1-ply is even
> more underrated.

Note that this was written in the context of someone claiming that 0-ply
is overrating positions. I do not think this is the case. The general
principle holds: On average, 1-ply rates positions lower than 0-ply

> I thought I'd try an experiment comparing zero-ply and one-ply
> evaluations. Here's what I did:
>
> 1. I collected a large number backgammon games between good players,
> some human-vs-human, some human-vs-computer. From these I took
> a representative sample of positions. (However duplicate positions
> were deleted so early game positions are under-represented.)

It would be nice to know the size of your sample.

> 2. I rolled out each position to the end of the game thirty-six times
> using cubelss zero-ply evaluation. Variance reduction was applied.
>
> 3. I took the root-mean-square average of the differences between
> GnuBG's zero-ply evaluation and the rollout results, and between
> GnuBG's one-play evaluation and the rollout results. I looked
> only at game-winning chances; I didn't look at gammons or
> backgammons.

Any specific reason for using the root-mean-square? I would probably go
for the average absolute error as the indicator.

> These are the results:
>
> Zero-ply evaluation: Average error = 0.0300
> One-ply evaluation: Average error = 0.0284
>
> So one-ply evaluation does do better on average. This is to be
> expected; being able to look ahead one ply should be a help,
> especially in volatile positions.
>
> In certain games GnuBG's evaluation seems to oscillate back and forth
> according to which side's turn it is to play. When this happens, a
> one-ply evaluation (which essentially looks at the game from the other
> player's side) can give quite different numbers than a zero-ply
> evaluation. You might expect when zero-ply and one-ply evaluations
> differ by a lot that the true value of the position is probably
> somewhere in between. I thought it would be interesting to see what
> would happen if you had an evaluator that used the average of zero-ply
> and one-ply. I called this a "0.5-ply evaluation."
>
> 0.5-ply evaluation: Average error = 0.0245
>
> So 0.5-ply does do better!

This seems to agree nicely the results of Joseph Heled:

http://mail.gnu.org/archive/html/bug-gnubg/2003-02/msg00218.html

which examine the ability of 0.5-ply to make actual game-decisions.
(Both cube and checker decisions, if I am not mistaken)

> In fact, it does enough better to make you
> wonder if it does even better than two-ply. (I didn't look into this.)

I am very interested in this as well. It would be great to add 1.5-ply
and 2-ply to the list. I think I aske Joseph to make his
benchmark available some time ago, and I would like to repeat the
request. If possible in some semi-readable format ...

The same goes for your sample of positions.

> Can we do even better? Something I noticed is that you can often
> predict whether zero-ply or one-ply is better for a particular
> position by looking at the relative pipcount. (The relative pipcount
> is your own pipcount minus your opponent's pipcount.) When the
> relative pipcount is between -160 and -40, one-ply usually does
> better; when the relative pipcount is between 40 and 150, zero-ply

> usually does better. `

The pipcount is strongly related to the gwc. Could you perhaps check
if the correlation between the 0-ply eval and BestPly is stronger or
weaker than between pipcount and BestPly? (BestPly is 0 if 0-ply is
best, 1 if 1-ply is).

There might be some specific reason for 1-ply being better when you are
behind - I guess it has to do with the distribution of errors on 0-ply
evals (and thus the size of the 1-ply bias).

Bonus question: Find a best fit between "true equity" and p1 + (a * p0 +
b)(p0- p1)

> Let's call an evaluator based on this idea a
> "hybrid evaluator." How well does the hybrid evaluator perform?

Just to clarify: What does the hybrid do when the pip-count is between
-40 and 40? Is it then using 0.5-ply?

What happens above 150 and below -160?

> Hybrid evaluator: Average error = 0.0225
>
`> It should be noted that these tests show how well GnuBG performs at
> computing the ABSOLUTE equity of a position. They may or may not
> indicate an improvement in GnuBG's ability to *play* a position, since
> playing depends on having accurate RELATIVE equities.

The importance of this can not be stressed enough. The above
"improvement" says very little about how hybrid would fare in
actual play. My guess is that you would face blunders neither made by 0-
nor 1-ply - in cases where different plies are compared (in
a hit/no-hit situation for instance)

> Nevertheless,
> I'm guessing that the 0.5-ply and hybrid evaluators play better than
> the integer-ply evaluators too.

As I write above, this has been tested for 0.5-ply, and it
does indeed score better on the benchmarkthan both 0 and 1-ply.

> - Would it be possible to build a fractional-ply evaluator into
> GnuBG that could evaluate positions at 0.5-ply, 1.5-ply, etc.?

I have implemented fractional plies for gnubg - in a slightly different
way than the straight average used for both your tests and Joseph's. If
you are compiling your own gnu, I'll be happy to send you the patch (the
one I sent to the bug-gnubg earlier was broken). Unfortunately, it only
does fractional plies above 1-ply :-( I think will look at implementing
0.5-ply soon.

> - Would it be possible to build a hybrid evalator into GnuBG like the
> one described above that would use a combination of zero-ply,
> one-ply, and 0.5-ply evaluation depending on the relative pipcount?

I think this is a little to specialized for what I would want to put
into gnubg - at least until it has been more rigorously tested. Since it
is hard to test that which is not there, I volunteer to implement
it IF I can get someone to actually test it against Joseph's benchmarks.

> At least for cube decisions, they should be an improvement over
> zero-ply or one-ply evaluations. And maybe checker-play will be
> better too, making them better for rollouts.

My hope is that the standard settings of gnu will one day be fractional
- at least for doubles.

--
MVH
Nis Jørgensen
Live from Hoofddorp

Peter Schneider

unread,

Oct 21, 2003, 3:18:43 AM10/21/03

to

Hi,

"Nis Jorgensen" <n...@dkik.dk> wrote

> Nis wrote:

Hmmm... I don 't understand your argument. Maybe I miss something. In
essence, I don't understand

>> If 0-ply is unbiased but imprecise (as in having average error 0)

>> then the value of the best move [do you mean after 0-ply or after
>> 1-ply?] will be overrated.

You do give an example with an overrated best move, but couldn't it
coincidentally be just underrated, i.e.

D 0.5 0.46 (*BEST MOVE*)

? Since 0-ply has an average error of 0, this should be equally
probable.

Sorry if I missed something obvious.

Nis Jorgensen

unread,

Oct 21, 2003, 5:47:47 AM10/21/03

to

On Tue, 21 Oct 2003 09:18:43 +0200
"Peter Schneider" <schneiderp...@gmx.net> wrote:

> >> Move True Equity 0-ply equity
> >> A 0.4 0.35
> >> B 0.4 0.45
> >> C 0.5 0.45
> >> D 0.5 0.55 (*BEST MOVE*)
>
>

> Hmmm... I don 't understand your argument. Maybe I miss something. In
> essence, I don't understand
>
> >> If 0-ply is unbiased but imprecise (as in having average error 0)
> >> then the value of the best move [do you mean after 0-ply or after
> >> 1-ply?] will be overrated.

This is the 0-ply evaluation of the resulting position. This
value of course ends up as 1/36 (or 1/18) of the 1-ply evaluation we are
doing.

> You do give an example with an overrated best move, but couldn't it
> coincidentally be just underrated, i.e.
>
> D 0.5 0.46 (*BEST MOVE*)
>
> ? Since 0-ply has an average error of 0, this should be equally
> probable.

If we assume that the error of 0-ply is symmetric, then a move which
is overrated is more likely to be considered the best move.

Consider my example of four moves above. Of the 4 possible ways to
distribute errors of (-0.05, 0.05) on moves C and D, 3 have a BestMove
Equity of 0.55, and only one has an equity of 0.45. Average equity
over all possibiities is 0.525, true equity is still 0.5

--
MVH
Nis Jørgensen
Live from Hoofddorp

æøå - hvis nu jeg faar brug for dem.

Peter Schneider

unread,

Oct 21, 2003, 8:19:36 AM10/21/03

to

Hi,

I asked, how on earth an unbiased net error could more often overrate the best move
than not, and you answered

>If we assume that the error of 0-ply is symmetric, then a move which
>is overrated is more likely to be considered the best move.

Oh, I see, thanks. [Although one might argue that the *real* best move is indeed
equally often over- as underrated ;-).] Seems to me as if one should adjust the end
result (either for each roll or for the accumulated 1-ply equity) by a certain
fraction of the estimated average net error to annihilate this effect.

Tom Keith

unread,

Oct 22, 2003, 8:40:15 PM10/22/03

to

Hi Nis.

Thank you for your thoughful reply.

Nis Jorgensen wrote:
> A little while ago, I suggested on the bug-gnubg list an explanation of
> why 1-ply is off the mark.

I'm not sure it is accurate to say that 1-ply is more off the mark
than 0-ply. According to the test I did, 1-ply is better at
estimating absolute equity than 0-ply. And I've heard that 1-ply
plays better than 0-ply too. At least in general.

The issue seems to be that there are certain classes of positions
where 0-ply is significantly better than 1-ply.

> Here is the important part:
>

[clip]

>
> Note that this was written in the context of someone claiming that 0-ply
> is overrating positions. I do not think this is the case. The general
> principle holds: On average, 1-ply rates positions lower than 0-ply

On the positions in my sample, 0-ply underestimates equity on average,
and 1-ply overestimates.

Average Equity
0-ply: -0.017
1-ply: +0.045
rollout: +0.012

> It would be nice to know the size of your sample.

150,000 positions

> I think I aske Joseph to make his
> benchmark available some time ago, and I would like to repeat the
> request. If possible in some semi-readable format ...
>
> The same goes for your sample of positions.

I'd be happy to make my rollouts/evaluations available.

> The pipcount is strongly related to the gwc. Could you perhaps check
> if the correlation between the 0-ply eval and BestPly is stronger or
> weaker than between pipcount and BestPly? (BestPly is 0 if 0-ply is
> best, 1 if 1-ply is).

Correlation between pipcount and BestPly = -0.190257
Correlation between 0-ply GWC and BestPly = 0.618061

> Bonus question: Find a best fit between "true equity" and
> p1 + (a * p0 + b)(p0- p1)

What is this expression?

> Just to clarify: What does the hybrid do when the pip-count is between
> -40 and 40? Is it then using 0.5-ply?
> What happens above 150 and below -160?

Outside the given ranges, use 0.5-ply:

if ( -160<rpc && rpc<-40 ) return eval1 ;
if ( 40<rpc && rpc<150 ) return eval0 ;
return 0.5*eval0 + 0.5*eval1 ;

> I have implemented fractional plies for gnubg - in a slightly different
> way than the straight average used for both your tests and Joseph's. If
> you are compiling your own gnu, I'll be happy to send you the patch (the
> one I sent to the bug-gnubg earlier was broken). Unfortunately, it only
> does fractional plies above 1-ply :-( I think will look at implementing
> 0.5-ply soon.

How is your implementation different than straight average?

> > - Would it be possible to build a hybrid evalator into GnuBG like the
> > one described above that would use a combination of zero-ply,
> > one-ply, and 0.5-ply evaluation depending on the relative pipcount?
>
> I think this is a little to specialized for what I would want to put
> into gnubg - at least until it has been more rigorously tested.

Yes, you would want to have it well tested. And, even before that,
there may be better ways that 0-ply and 1-ply can be usefully combined.
As I say, I'm willing to share my database of positions with anyone who
has ideas they'd like to try out.

Tom

MuffinHead

unread,

Oct 22, 2003, 9:17:13 PM10/22/03

to

I'm coming a bit late into this, but what you are probably experiencing
is the so-called "odd-even effect" found in heuristic search. Check out
page 4 of
http://www.cs.ualberta.ca/~jonathan/Courses/657/Notes/6.EvaluationFunctio
ns.pdf

Basically, results at odd ply are optimistic, because the final layer of
the tree is made by the player-to-move. Results at even ply tend to be
pessimistic, because the final play is made by the other player.

In this sense, values tend to oscillate between odd and even levels.
Most programs that do iterative deepening will tend to iterate in steps
of 2 ply to avoid this. (It also has negative performance implications
for memory assisted search that I won't get into right now).

Peter is basically right; it doesn't make sense for a search algorithm
to be _worse_ with deeper search. There hasn't been a domain yet where
that idea holds in the general case. Search is knowledge. Knowledge is
search.

I guess the question is, how is error being defined here? If the "true"
equity is based on 1-ply searches (what gnubg calls 0-ply but the rest
of the universe calls 1-ply, I don't use the gnubg terminology) but
you're comparing 1-ply searches versus 2-ply searches of the same
position, it's likely that the 2-ply search will probably end up being
off by more, because of the oscillation in values. What you might want
to do instead of using "fractional ply" evaluators (ad hoc at best),
you'd want to try a 1-ply versus a 3-ply evaluator.

Of course, improving the search from using the current forward-pruning
approach to "optimal" search is my job. 8)

Louis Nardy Pillards

unread,

Oct 23, 2003, 8:31:37 PM10/23/03

to

MuffinHead wrote:

> I'm coming a bit late into this, but what you are probably
> experiencing is the so-called "odd-even effect" found in heuristic
> search. Check out page 4 of
> http://www.cs.ualberta.ca/~jonathan/Courses/657/Notes/6.EvaluationFunc

> tio ns.pdf

>
> Basically, results at odd ply are optimistic, because the final layer
> of the tree is made by the player-to-move. Results at even ply tend
> to be pessimistic, because the final play is made by the other player.
>
> In this sense, values tend to oscillate between odd and even levels.
> Most programs that do iterative deepening will tend to iterate in
> steps of 2 ply to avoid this. (It also has negative performance
> implications for memory assisted search that I won't get into right
> now).
>
> Peter is basically right; it doesn't make sense for a search

> algorithm to be worse with deeper search. There hasn't been a domain

> yet where that idea holds in the general case. Search is knowledge.
> Knowledge is search.
>
> I guess the question is, how is error being defined here? If the
> "true" equity is based on 1-ply searches (what gnubg calls 0-ply but
> the rest of the universe calls 1-ply, I don't use the gnubg
> terminology) but you're comparing 1-ply searches versus 2-ply
> searches of the same position, it's likely that the 2-ply search will
> probably end up being off by more, because of the oscillation in
> values. What you might want to do instead of using "fractional ply"
> evaluators (ad hoc at best), you'd want to try a 1-ply versus a 3-ply
> evaluator.
>
> Of course, improving the search from using the current
> forward-pruning approach to "optimal" search is my job. 8)

Yes, odd ply is optimistic, and even ply tends to be pessimistic.
But 'odd' and 'even' is gnubg odd and gnbug even.

Here's an example:

Cube analysis
0-ply cubeless equity -0.921
0.040 0.000 0.000 - 0.960 0.000 0.000
Cubeful equities:
1. No double -0.975
2. Double, pass +1.000 ( +1.975)
3. Double, take -1.949 ( -0.975)
Proper cube action: No double, beaver (33.0%)

Cube analysis
1-ply cubeless equity +0.934
0.967 0.000 0.000 - 0.033 0.000 0.000
Cubeful equities:
1. Double, pass +1.000
2. Double, take +1.846 ( +0.846)
3. No double +0.979 ( -0.021)
Proper cube action: Double, pass

Cube analysis
2-ply cubeless equity -0.899
0.050 0.000 0.000 - 0.950 0.000 0.000
Cubeful equities:
1. No double -1.000
2. Double, pass +1.000 ( +2.000)
3. Double, take -2.000 ( -1.000)
Proper cube action: No double, beaver (33.3%)

Cube analysis
3-ply cubeless equity +0.962
0.981 0.000 0.000 - 0.019 0.000 0.000
Cubeful equities:
1. Double, pass +1.000
2. Double, take +1.911 ( +0.911)
3. No double +1.000 ( +0.000)
Proper cube action: Double, pass

--
Louis Nardy Pillards

kiwi

unread,

Oct 29, 2003, 9:37:22 PM10/29/03

to

the position is illegal.

"Louis Nardy Pillards" <nardy dot pillards at skynet dot be> wrote in message news:<3f9872e9$0$3653$ba62...@reader2.news.skynet.be>...

Louis Nardy Pillards

unread,

Nov 1, 2003, 2:06:33 PM11/1/03

to

kiwi wrote:

> the position is illegal.
>

Yep

Nardy