Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Neurality

2 views
Skip to first unread message

Sam Pottle

unread,
Feb 27, 1999, 3:00:00 AM2/27/99
to
I've been reading some of the papers on computer backgammon, and I have
some questions for the neuralistas:

-- I found the following description of Berliner's blockading feature in
his IJCAI '77 paper:

> For each combination of zero to seven blockading points at a distance
> of 1 to 12 spaces in front of a man, we computed the number of rolls
> that could legally be played by the blockade runner.

Is there a fuller description somewhere in the literature?

-- The trend in NN design for backgammon seems strongly toward nets with
lots of inputs, and the Neurogammon paper has some nice comparative
studies indicating that having various input features is lots better
than not having them. But it seems to me that there is a tradeoff
available here. If I have a network with 400 inputs and 150 neurons,
that's 60,000 weights. I could use a really primitive board encoding to
design a network with 60 inputs and 1,000 neurons that would evaluate
just as fast. Does this really lose?

-- Speaking of network architecture, it seems like everybody's using a
single hidden layer these days, although Tesauro found that two
layers worked a bit better for Neurogammon. How come?

-- I recall some mention in this forum of some theoretical studies
suggesting that lambda should be roughly 1/branching-factor. Does
anyone have references for these?


Sam

Gary Wong

unread,
Feb 27, 1999, 3:00:00 AM2/27/99
to
pot...@lunabase.org (Sam Pottle) writes:
> I've been reading some of the papers on computer backgammon, and I have
> some questions for the neuralistas:
>
> -- I found the following description of Berliner's blockading feature in
> his IJCAI '77 paper:
>
> > For each combination of zero to seven blockading points at a distance
> > of 1 to 12 spaces in front of a man, we computed the number of rolls
> > that could legally be played by the blockade runner.
>
> Is there a fuller description somewhere in the literature?

My copies of the papers are at work so I can't read them until Monday, but
I once implemented what I thought was a reasonable interpretation at the time.
The interpretation was this:

int Escapes[ 0x1000 ], i, c, n0, n1;

for( i = 0; i < 0x1000; i++ ) {
c = 0;

for( n0 = 0; n0 <= 5; n0++ )
for( n1 = 0; n1 <= n0; n1++ )
if( !( i & ( 1 << ( n0 + n1 + 1 ) ) ) &&
!( ( i & ( 1 << n0 ) ) && ( i & ( 1 << n1 ) ) ) )
c += ( n0 == n1 ) ? 1 : 2;

Escapes[ i ] = c;
}

ie. for each of the 2^12 points (i) in front of a chequer (I don't worry about
limiting them to 7 points occupied -- illegal configurations will never be
referenced), generate the 21 dice combinations (n0 and n1) and count (c)
the number of combinations that can be played (twice for non-doublets);
to qualify, the sum (n0 + n1) and either n0 or n1 must be playable.
("Playable" is indicated by a 0 bit the appropriate number of bits into i.)
Perhaps this interpretation isn't strictly correct for doublets, but I
suspect it's close enough.

The rest of the code is available if you're interested, at:
ftp://ftp.cs.arizona.edu/people/gary/gnubg-0.0.tar.gz

(Look in the file eval.c.)

> -- The trend in NN design for backgammon seems strongly toward nets with
> lots of inputs, and the Neurogammon paper has some nice comparative
> studies indicating that having various input features is lots better
> than not having them. But it seems to me that there is a tradeoff
> available here. If I have a network with 400 inputs and 150 neurons,
> that's 60,000 weights. I could use a really primitive board encoding to
> design a network with 60 inputs and 1,000 neurons that would evaluate
> just as fast. Does this really lose?

Personally I would prefer to add input nodes in preference to hidden units
(for an equivalent number of weights) as long as the new inputs are providing
information that isn't easily derived from the others. I'm certain that 60
inputs is too small (that's barely enough to cover one input per player
per point); if some information is useful in evaluating the strength of a
position then it is more efficient to provide that information at the inputs
if you can, rather than supplying extra hidden units to derive it.

> -- Speaking of network architecture, it seems like everybody's using a
> single hidden layer these days, although Tesauro found that two
> layers worked a bit better for Neurogammon. How come?

My impression is that you take a big performance hit in adding more layers;
in particular the training cost is considerably greater. I haven't yet
tried more than one layer so I can't provide a justified comparison.

Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

bshe...@my-dejanews.com

unread,
Mar 1, 1999, 3:00:00 AM3/1/99
to
In article <92007583...@ylum.lunabase.org>,

pot...@lunabase.org (Sam Pottle) wrote:
> I've been reading some of the papers on computer backgammon, and I have
> some questions for the neuralistas:
>
> -- I found the following description of Berliner's blockading feature in
> his IJCAI '77 paper:
>
> > For each combination of zero to seven blockading points at a distance
> > of 1 to 12 spaces in front of a man, we computed the number of rolls
> > that could legally be played by the blockade runner.
>
> Is there a fuller description somewhere in the literature?

I haven't been able to find one in the literature. I got a description from
Gerry Tesauro.

You precompute a table of blockade strengths. A "blockade" is considered to be
the pattern of blocked points in a set of 12 consecutive points. I use a table
of 4096 bytes, where the strength of a blockade is encoded in a single byte.

The strength of a blockade equals the number of rolls that cannot be played
fully. For example, if you just hold one point 10 pips away then 6-4 and 5-5
cannot be played fully, so the blockade strength is 3. If you added the point
6 pips away, then 2-2, 3-3, 6-6, 4-2, and 5-1 could not be played fully, so
the blockade strengh is 3 + 7 = 10.

Quick suggestion: normalize these numbers by dividing by 36.0. Keeping all
your neural network inputs in a roughly comparable range improves learning
rates.

Once you have a table of blockade strengths, you can use it to compute two
important measures. One is the Actual blocking, which is the strongest
blockade that actually has an enemy man behind it. Another is the Potential
blockade, which is the strongest blockade of all.

Maybe you can come up with other uses of this table.

> -- The trend in NN design for backgammon seems strongly toward nets with
> lots of inputs, and the Neurogammon paper has some nice comparative
> studies indicating that having various input features is lots better
> than not having them. But it seems to me that there is a tradeoff
> available here. If I have a network with 400 inputs and 150 neurons,
> that's 60,000 weights. I could use a really primitive board encoding to
> design a network with 60 inputs and 1,000 neurons that would evaluate
> just as fast. Does this really lose?

Well, the specific tradeoff you mentioned would lose because you cannot
provide a good state description for backgammon using only 60 inputs, but you
have a good question.

First off, the key insight of the Neurogammon project is that the accuracy of
the neural network increases when the inputs match crucial domain concepts.
For example, in backgammon it is logically sufficient simply to provide an
array of 26 integers as inputs. Unfortunately this does not provide good
discrimination because the quality of a position is a highly non-linear
function of the input.

Gerry Tesauro's (and his coauthor's (Terry Sejnowski??)) insight is that when
you use boolean inputs for small numbers of men(e.g. one boolean to represent
having a White blot, one for 2 White men, etc.) and only use an integer to
represent "excess men" then the neural network learns much faster and performs
at a higher level.

Per point per color you have Blot, Point, Builder, and Excess inputs (some
representations even have an input for having exactly 4 men on a point, but I
haven't found this useful except for the 6, 12, 13, and 19 points) so you have
a total of 25 * 8 = 200 inputs. (You needn't encode the number of men off the
board, since that is a linear function of the others.)

Now to your question: a neural network with 200 inputs and 80 neurons has
16000 weights. If you used only 26 inputs you could have 615 neurons and
still have the same number of weights. How can this lose?

The key observation is that the 200 inputs are almost always zero, and when an
input is zero it contributes nothing to the calculation expense because you do
not have to propagate its weights. It follows that the 200 input network is
very much faster than the 26 input network.

It turns out that the correct equivalent number of neurons for the 26-input
network is 80. Notice that in the group of 8 inputs that encode the contents
of a single point we have at most one non-zero input. And if a point is empty
then we have no non-zero inputs. It follows that the number of non-zero
inputs in the 200-input encoding is exactly the same as the number of
non-zero inputs in the 26-input encoding. So if you wished to be equivalent
in speed, the 26-input network must have 80 neurons. (Actually a little less,
because there are also benefits to having inputs whose value is 1.0, since
that avoids a scalar multiplication in the hidden-node computation.)

Of course, actually buidling such a 26-input network would be a big step
backwards because the 200-input network has a much richer input encoding.

Since boolean inputs are usually zero, it costs relatively little to add an
input, and relatively more to add a neuron. This suggests that adding
additional "feature recognizers" is pretty cheap.

Having said the foregoing, I must report that for quite some time I haven't
been able to increase the strength of my network by adding new inputs, whereas
I have been able to increase its strength by increasing the number of neurons.

> -- Speaking of network architecture, it seems like everybody's using a
> single hidden layer these days, although Tesauro found that two
> layers worked a bit better for Neurogammon. How come?

There are a lot of practical problems to multi-layer networks. The backprop
training algorithms that works so well in theory do not work in practice
because the gradient for layers beyond the first is nearly zero. Having a
gradient that is so small means that there is almost no learning going on in
early levels. It follows that you have to "hack" your learning algorithm, and
that will suck up a lot of your time.

The common justification for multilayer networks is that they "respresent
richer features." This isn't true in theory, of course, since a single-layer
network is a universal function approximator, but in practice there may be
functions that are easy to approximate using sigmoids of signmoids that are
hard to approximate using sigmoids alone.

But in backgammon you aren't likely to find such functions. The reason is
that the ability of a feature to discriminate between winning and losing
positions is a logistic function (i.e. A/(B + exp(C * x)). It follows that a
single-layer function contains all the necessary richness.

From a computational perspective, instead of using deeper networks why not
double the number of neurons in a single layer? It will train faster and
probably play better. And if you need extra input features, why not code them
by hand?

> -- I recall some mention in this forum of some theoretical studies
> suggesting that lambda should be roughly 1/branching-factor. Does
> anyone have references for these?

This is in one of Sutton's papers. Check out his papers online at UMass's web
site.

By the way, I do not recommend using non-zero lambdas. Using lambda == 0
results in a much faster, simpler program and the training quality is very
nearly as good. After all, what difference is there between lambda == 0.0278
and lambda == 0? Hardly any signal filters through when lambda == 0.0278. Why
not just make lambda zero, and take the benefits of simpler and faster code?

Warm Regards,
Brian Sheppard

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

0 new messages