Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Writing a bg program

4 views

Skip to first unread message

Gary Wong

unread,

Jan 20, 1999, 3:00:00 AM1/20/99

whig...@aol.com (Whigdon2) writes:
> My intuition tells me to start at the end, and work backwards, i.e. develop a
> program that plays a near perfect end game strategy, and is knowledgeable with
> the cube, and then work my way backwards. I think that I would like the
> program to make decisions based upon it's own simulations, where many of the
> moves are based upon prior simulations. I realize that this may be a bit
> idealistic of me, and that I am likely to face some memory constraints.

I believe this approach is feasible; I am working along similar lines
myself. The best finished player I have is only of intermediate
strength (it plays as Costello on FIBS, rated around 1600 +/- 50) but
I have bits and pieces of a newer player which I expect will be
significantly stronger. Using strong evaluations of subsequent
positions to train evaluators of positions early in the game appears
to work well in practice. In some sense, TD training can be
considered as doing this; a more explicit approach I am investigating
is to break the set of all positions into fine-grained classes, use a
distinct evaluator for each class, establish a partial ordering on
those classes, roll out a large number of positions in those classes
according to the partial ordering, and use supervised training to
obtain the evaluators of each other class in turn.

"Knowledgeable with the cube" may or may not be important: the
standard heuristics for deriving cube behaviour given cubeless
evaluations seem to work adequately if you are careful. Ideally it
would be nice to have a genuinely cubeful evaluator, but at present it
doesn't appear to be worth the headaches (ie. correcting the absolute
error in current evaluators still appears to be more fruitful than
making those evaluators cubeful).

Memory constraints have not been a problem in my experience: for
bearoff classes I use a couple of databases mmap()ed into memory,
though this could just as easily be read off disk as required; for
most other classes I use neural net evaluators which need of the order
of 100KB each.

> I am hoping to borrow from the experiences of others. Please share your
> references, ideas, or suggestions.

References:
-----------
There isn't a huge amount in the academic literature on backgammon
playing programs (compared to chess or othello for instance), but
it would be worth your time tracking down papers by Tesauro (for
applying TD training to backgammon) and Berliner (general ideas
about what metrics to derive from positions).

There have been many articles posted here which are collectively
invaluable; search on Deja News or look for archived posts at:

http://www.bkgm.com/rgb/rgb.cgi?menu+programming

More references on the Web are available at:

http://forum.swarthmore.edu/~jay/learn-game/systems/gammon.html

Some of my code can be grabbed from:

ftp://ftp.cs.arizona.edu/people/gary/

(still very badly organised, sorry -- may or may not compile depending
when you look at it :-)

Ideas:
------
In the long term I want to take advantage of the fact that supervised
training is highly parallelisable (in that almost all of the time is
spent acquiring accurate training data by rolling out positions; once
the training data is there it only takes a few hours to train a net
from it). So far I have been generating all training data myself, but
once I am able to release a player strong enough to generate
accurate rollout data to anybody who wants it, I want to collect the
results from those who can spare the CPU time to roll out what they
can; distribute subsequent evaluators trained on that data; use them
to generate subsequent results, etc. etc.

Suggestions:
------------
If you want to get started quickly it might be worth taking a look at
my code for administrative fluff like move generation, bearoff
databases, generic neural net code etc. so you can spend more time on
interesting problems.

Grab a few free players (eg. xgammon, Tesauro's pubeval, mine) and
use them to benchmark your player against.

If using a neural net evaluator, the approach I take to training (which
may be a long way from optimal, but it's very slow trying different
things to see what works best!) is to start with random weights; use
TD training until the network stops improving; then generate LOTS
(hundreds of thousands) of rollouts and start supervised training on
those. I find it very hard to manage the quantity/quality tradeoff.
My approach so far has been not to go overboard on the number of
trials per position: estimate the standard error you'd expect your
evaluator to have after training and don't bother reducing the
standard deviation of your rollouts too far below that; I suspect
you'd be better spending your time rolling out more varied positions.
In the past I have used no lookahead in the rollouts just so I could
get a large amount of data in reasonable time; now I'm looking at
using 1-ply lookahead with variance reduction, mainly to lower the
_systematic_ error in the data. Neural nets are very good at
ignoring statistical noise in training data, but are hurt by
systematic error. My rough guess is that a standard deviation
of ~0.01 from 36 or 72 rollouts per position is more than sufficient.

I hope some of the above has been useful -- yell out of you want more
detail in any area.

Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

0 new messages