+13-14-15-16-17-18-------19-20-21-22-23-24-+ X: 0
| O O | | X X X X X O |
| O | X | X X X X X |
| | | X X |
| | | |
| | | |
v| |BAR| | O on roll, match to 7.
| | | |
| | | |
| | | |
| X | | O O O O O |
| X O | | O O O O O | Cube: 2
+12-11-10--9--8--7--------6--5--4--3--2--1-+ O: 0
I (O) redoubled, my opponent thought for a second ("no guts, no glory", he
said :-), took, and immediately rolled 1-6 and gammoned me for the
match. We played out the position a couple of times while waiting for
the next match to start and established it's an easy take, and (I
believe) reluctant but correct redouble for money (just about all
sequences including a 6 for O are market losers). Perhaps it's not
quite enough to redouble at 7-away, 7-away since X has big recube vig
giving O a dead 8 cube.
Anyway, when I got home I ran it through a bot I'm working on (yet
another neural net I'm afraid... no imagination here :-). I only gave
it 50 cubeless rollouts (not much point trying more; it still plays
fairly weakly) and get 70% wins for O and 26% single wins for X, 4%
gammons for X for 0.34 cubeless equity. I believe this is a little
low because it plays the O side worse (leaves blots in the wrong
places and doesn't slot the bar).
Anyway, if anybody can be bothered trying it, I'd like to know what
some rollouts from mature bots have to say (and what human experts would
do with the cube).
Cheers,
Gary.
(I bet you don't believe it -- a post to this newsgroup from me that's
actually about backgammon ;-)
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/
Your rollout numbers are pretty close to my bot's evaluation. Close
enough for statistical equality at any rate, given the small number of
trials. But the numbers you quote work out to .40 equity, and it seems
peculiar that O never won a gammon. O should win more gammons than X.
One thing to point out: there is always value in doing more rollouts,
even with a weak player. If you use PUBEVAL to roll out checkerplay
decisions, you actually get a standard of play that is within a few
percent of neural network quality. No doubt your evaluation function is
stronger than PUBEVAL.
In fact, if you choose checker plays using rollouts from the weakest
neural network you achieve a skill level at least equal to 3-ply searching.
> Anyway, if anybody can be bothered trying it, I'd like to know what
> some rollouts from mature bots have to say (and what human experts would
> do with the cube).
My bot evaluates (i.e. no lookahead) the position as 0.511, so I think
it is probably a double/take. I admire the man that took this cube,
because most players would drop.
Brian
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
Then 1296 cubeful rollouts (for money).
Checkerplay at 2-ply and cubes at 3-ply. Cashing cube at 8.
The cubeless equity went up to 0.591 with O winning
75% games, 16.4% gammons and losing 7.4% gammons.
But its still an easy take! X loses 0.856 points with a 2-cube.
So, cubeful rollout says Redouble/Take.
Actually, after I've done a few cubeful rollouts with Snowie,
it surprises me how much value cubeownership gets.
If someone (Chuck Bower, you've referred to it lots of times,
but never posted it) would post Rick Janowski's formula for
takepoints with cubeownership, I would be interested in seeing
if it's consistent with cubeful rollouts.
Anyway. I rolled the position out for the matchscore aswell (cubeful).
506 games. The results are:
No double gives 60.8% matchwinningchances,
Redouble/take gives 62.5% and
Redouble/pass gives 62.8%.
So it looks clear to redouble, but only a small/uncertain take.
I don't know why it's an easier take for money than for match,
but I disagree with you that 'X has big recube vig
giving O a dead 8 cube.' because a redouble would kill the
gammons, and thus in some positions it will be a cash for
money, but a take/no double at the score.
Stig Eide
In article <wtd89pz...@brigantine.CS.Arizona.EDU>,
Gary Wong <ga...@cs.arizona.edu> wrote:
> I came across the following position recently:
>
> +13-14-15-16-17-18-------19-20-21-22-23-24-+ X: 0
> | O O | | X X X X X O |
> | O | X | X X X X X |
> | | | X X |
> | | | |
> | | | |
> v| |BAR| | O on roll, match to 7.
> | | | |
> | | | |
> | | | |
> | X | | O O O O O |
> | X O | | O O O O O | Cube: 2
> +12-11-10--9--8--7--------6--5--4--3--2--1-+ O: 0
>
> I (O) redoubled, my opponent thought for a second ("no guts, no glory", he
> said :-), took, and immediately rolled 1-6 and gammoned me for the
> match. We played out the position a couple of times while waiting for
> the next match to start and established it's an easy take, and (I
> believe) reluctant but correct redouble for money (just about all
> sequences including a 6 for O are market losers). Perhaps it's not
> quite enough to redouble at 7-away, 7-away since X has big recube vig
> giving O a dead 8 cube.
>
> Anyway, when I got home I ran it through a bot I'm working on (yet
> another neural net I'm afraid... no imagination here :-). I only gave
> it 50 cubeless rollouts (not much point trying more; it still plays
> fairly weakly) and get 70% wins for O and 26% single wins for X, 4%
> gammons for X for 0.34 cubeless equity. I believe this is a little
> low because it plays the O side worse (leaves blots in the wrong
> places and doesn't slot the bar).
>
> Anyway, if anybody can be bothered trying it, I'd like to know what
> some rollouts from mature bots have to say (and what human experts would
> do with the cube).
>
> Cheers,
> Gary.
>
> (I bet you don't believe it -- a post to this newsgroup from me that's
> actually about backgammon ;-)
> --
> Gary Wong, Department of Computer Science, University of Arizona
> ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/
>
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
Thanks
Richard
If I understand the above numbers correctly, the gammons are
included in the wins, so X wins 71.7% and O wins 28.3% of
the games, and in the rollouts Snowie always waits until the
position is a drop before doubling to 8. To redouble to
8 O needs 76.7% game winning chance, according to the match
equity calculator I have. If O will always wait until he has
a drop, his chances will need to be more like 78% on average
(ignoring the option of playing on for a gammon).
So O currently has 28.3% game winning chance, and needs to get to
approx. 78%. The chance of this happening is 28.3% / 78% = 36.3%.
By doubling X out, O is preventing him winning the 22% of games he
would have won, so X now loses 36.3% of 22% of games by giving up
the cube, which is 8.0%. Quite large. This corresponds to an equity
with a 1 cube of 0.16
According to the rollout data above, X's equity with a 1 cube is
0.591, and when O has a 2 cube it is 0.891 - unit equity of
0.446. the difference, 0.591 - 0.446 = 0.146 should be due to the
cube owenership, and is close to the 0.16 estimate above.
I think that the business of always waiting until the drop is
correct before doubling in rollouts, although it prevents freak
results with huge cubes, does tend to inflate the value of cube
ownership by a factor of almost 3 (given that roughly 2/3 of doubles
are takes if you use the cube properly, as noted in a previous
thread).
Disclaimer - all the above is my personal opinion having thought about
it a bit recently, and should not necessarily be taken as fact.
Feel free to point out any errors in my reasoning.
Phill
(snip)
>If someone (Chuck Bower, you've referred to it lots of times,
>but never posted it) would post Rick Janowski's formula for
>takepoints with cubeownership, I would be interested in seeing
>if it's consistent with cubeful rollouts.
(snip)
Are you SURE I've never posted it? Actaully, a dejanews search
found that on 12/27/96 in a r.g.bg post I said:
A couple years back, Rick Janowski wrote a three part series of
articles in Hoosier Backgammon Newsletter on money doubling. His
analysis was rather involved and complete (treating Jacoby or not,
Beavers or not, etc.). He also factored in cube efficiency. The
most useful (IMHO) equation coming from that article is the one which
gives the drop/take point for the player NOT on roll. That formula is:
L - 0.5
drop/take point = -------------
W + L + 0.333
where W and L are the player NOT on roll's average winning and losing
cubeless equity. For example, if player NOT on roll wins a simple game
s fraction of the time, a gammon g fraction of the time, and a backgammon
b fraction of the time, then W is (s + 2g + 3b)/(s + g + b). From the
Jellyfish results window, this is even easier to get, since it's just
the sum of the entries in the row (total wins + g or bg + bg) divided by
total wins.
L is computed similarly, but now looking at the distribution of
losses for the player NOT on roll (or equivalently, the distribution
of wins for the player ON roll). Normally you just take the rollout
results, compute W and L, and then plug them in to find the drop/take
line and decide if the position is a drop or a take.
(and back to 15 August 1998...)
BTW, units of 'drop/take point' are win fraction for player receiving
cube. Also, the 0.333 term in the denominator is written more generally
E/2 where 'E' is a cube efficiency factor--E=1 for perfect cube efficiency;
E=0 for zero cube efficiency. Janowski recommended E = 2/3 as being
typical, so that is what I've always used.
Chuck
bo...@bigbang.astro.indiana.edu
c_ray on FIBS
In article <6rusg2$rm$1...@flotsam.uits.indiana.edu>,
bo...@bigbang.astro.indiana.edu (Chuck Bower) wrote:
(snip)
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
It's not you, it's my memory :-( I think I rolled a 6 and escaped to X's
bar and played the chequer from my 15 point into my outfield. X rolled 1-6
and hit on his bar. I entered without a return hit or escape and it looked
close for a roll or two, but soon X hit loose on 24, I danced, he covered
and closed out 2 chequers and gammoned me.
Or perhaps it was some other sequence. Whatever I did, it didn't work :-)
Cheers,
Gary.
Oops! We're both wrong. The estimated cubeless equity from those
short rollouts is 0.36 (0.70 - 0.26 - 2 x 0.04). The 26% single wins
for X figure does NOT include gammons; to be less misleading I should
have used the traditional form and stated O wins 70% and X wins 30%
including 4% gammons.
> One thing to point out: there is always value in doing more rollouts,
> even with a weak player. If you use PUBEVAL to roll out checkerplay
> decisions, you actually get a standard of play that is within a few
> percent of neural network quality. No doubt your evaluation function is
> stronger than PUBEVAL.
It's only a little stronger (of the order of 0.1ppg cubeless after
100,000 training games). I haven't yet been patient enough to give it
a decent amount of training because I keep coming up with ideas for
new inputs and end up starting again... has anybody had any luck
experimenting with somehow saving (at least some of) the information
in previously trained weights while introducing new inputs (or hidden
nodes, for that matter)? Is initialising the new weights to random
values while maintaining the existing weights, and starting training
all over (back to the initial high alpha, etc.) reasonable? Does
maintaining different values of alpha for different weights help? How
much training does it really take to get an idea of how effective your
inputs are -- is it reasonable to train different networks (ie.
networks with different inputs) with a limited number of hidden nodes
over a short number of training games, and then take the "best" inputs
and expect them to perform well in a network with lots of hidden nodes
and extensive training? What's a good value for lambda? Does
combining MC and TD training (ie. TD(lambda) at each step followed by
a TD(1) on the terminal step) buy you anything? What about MC vs. TD
in general? Is using lookahead evaluations during training ever worth
the expense? It only takes 10 seconds to come up with another question,
but days or weeks to answer it...
> In fact, if you choose checker plays using rollouts from the weakest
> neural network you achieve a skill level at least equal to 3-ply searching.
What kind of positions does that apply to? In non-contact positions I
believe you unconditionally. But in positions the net misplays (especially
if it plays one side worse than the other), how much faith do you need to
have in the original evaluator before you have reasonable trust in its
rollouts?
Thanks very much for your help, and sorry for asking so many questions :-)
Thanks very much for posting that!
> In article <wtd89pz...@brigantine.CS.Arizona.EDU>,
> Gary Wong <ga...@cs.arizona.edu> wrote:
> > I came across the following position recently:
> >
> > +13-14-15-16-17-18-------19-20-21-22-23-24-+ X: 0
> > | O O | | X X X X X O |
> > | O | X | X X X X X |
> > | | | X X |
> > v| |BAR| | O on roll, match to 7.
> > | X | | O O O O O |
> > | X O | | O O O O O | Cube: 2
> > +12-11-10--9--8--7--------6--5--4--3--2--1-+ O: 0
>
> Perhaps it's not
> > quite enough to redouble at 7-away, 7-away since X has big recube vig
> > giving O a dead 8 cube.
>
> I don't know why it's an easier take for money than for match,
> but I disagree with you that 'X has big recube vig
> giving O a dead 8 cube.' because a redouble would kill the
> gammons, and thus in some positions it will be a cash for
> money, but a take/no double at the score.
That's true, but my guess is that the total equity from those eventualities
isn't huge because X won't get to positions with big gammon threats often
enough. The match equity between winning 1, 2, 4 and 8 points at 7-away,
7-away is surprisingly linear (from 50% through 56.1%, 62.8%, 76.7% and
100% -- very close to 6.3% per point) so I claim that the cube handling
differs from money play by only 2 factors:
1) If the cube reaches 8, gammons don't count (hurts X)
2) If the cube reaches 8, O cannot redouble (helps X)
I also claim that factor 2) is more of an influence than factor 1), and
so X is happier to own the 4-cube at this match score than he is for money.
Gary, this is your lucky day.
> It's only a little stronger (of the order of 0.1ppg cubeless after
> 100,000 training games).
PUBEVAL is surprisingly resilient. Of course you can win almost 1.0 ppg
against it if you adopt intelligently defined anti-PUBEVAL tactics, but if
you just train a neural network to play well and pit it against PUBEVAL you
won't do as well.
My network scores around 0.6 ppg (cubeless) versus PUBEVAL. I believe this is
about as well as you can do. (If you read the literature you may have read
that HCGAMMON wins 85% against PUBEVAL, but that turns out to be erroneous.)
> I haven't yet been patient enough to give it
> a decent amount of training because I keep coming up with ideas for
> new inputs and end up starting again...
This is a reasonable plan, but just be sure that your inputs really make a
difference. You need nothing beyond the raw input description (as defined by
Tesauro), priming (as defined by Berliner), and shot-counting. These alone
are sufficient to play at championship caliber when combined with 1 or 2 ply
search engine.
My recommendation is to pare down your inputs to these alone, then debug
those terms exhaustively. Check the value of these terms in every conceivable
natural and unnatural situation. Then train your network to exhaustion.
Then profile your program's play using test cases. Where does the program
make a mistake? It helps a lot to profile many test cases, so that you can
prioritize your attack on the most-significant type of error. Once you choose
a respresentative test case to work on, you need a procedure that often
results in improving the program.
Every mistaken checkerplay results in two position to analyze, which I call
Predicted (the right move) and Actual (the program's move). Predicted and
Actual will differ in certain input terms. Verify that the input terms are
correctly computed in every case. Correct any bugs, and proceed either to
retrain the evaluation or to choose another test case, depending on the extent
of the changes you have made.
If the inputs are computed correctly, then you must extend the input space in
order to give the program a chance to correct its mistake. To design an input
term that has the best chance of making progress, follow these guidelines:
1) Never include a term that is a linear combination of existing terms.
Such terms (e.g. inner board point count, number of men borne off, pip
count, number of anchors, and many other backgammon concepts) can be
synthesized by the neural network if they are relevant. You need to add
something new to the program.
2) Make sure that the term you add distinguishes Actual and Predicted. If
it does not, then the chance you will solve this problem is very slight.
3) Make sure that every term has a tactical basis in backgammon theory.
The simplest way is to encapsulate something that would ordinarily require
lookahead to discover. For example, you can create a term that estimates
the number of return shots. Or blot exposure in bearoffs. Or racing equity.
4) Make sure your term has broad generality, but no broader than
appropriate.
5) Give your term a natural scale. By this I mean that the term has a range
roughly comparable to the other inputs in the system. (The downside of not
doing this is that some weights in your network will train very slowly
compared to others.)
> has anybody had any luck
> experimenting with somehow saving (at least some of) the information
> in previously trained weights while introducing new inputs (or hidden
> nodes, for that matter)?
Of course! I just initialize new weights to small random numbers.
> Is initialising the new weights to random
> values while maintaining the existing weights, and starting training
> all over (back to the initial high alpha, etc.) reasonable?
Never restart training from scratch.
At the very least keep a record of the games you played while training the
first time, so that you can train those games without generating them.
Gennerating a game is 20 times as expensive as training on it.
Don't open alpha all the way back up. It might make sense to increase it a
little, but not much. You will probably undo your progress by opening it up a
lot.
Remember that increasing alpha has the effect of weighing future experience
more heavily than past experience. This makes sense if your network (which
encapsulates past experience) is inexperienced. But when you make a minor
change to an experienced network, it is probably a bad idea to weight the
future much more heavily.
> Does maintaining different values of alpha for different weights help?
Different learning rates help in theory. One point is that if two inputs
differ radically in scale you can use a large learning rate for the input
that has a bigger scale. But in practice it matters little provided that you
give your inputs roughly comparable scale.
Another way that varying learning rates helps is in identifying inputs that
do not matter. For example, if you have an adaptive algorithm that varies
learning rates according to the accuracy gained from adjusting that input,
then the learning rates for irrelevant inputs rapidly drive to 0. Sutton
wrote a paper entitled "Gain Adaptation beats Least Squares" (or something
like that) in which he describes such an algorithm. However, it doesn't work
in practice because it is much slower than backprop. Backpropagation is fast;
adaptive-gain algorithms can take many times as long (Sutton's algorithm
takes 13 times as much time) and that is a huge disadvantage.
The concept was so attractive that I also tried a variant of RProp. It, too,
proved slower than backprop.
My judgement: discard the whole idea.
> How much training does it really take to get an idea of how effective your
> inputs are -- is it reasonable to train different networks (ie.
> networks with different inputs) with a limited number of hidden nodes
> over a short number of training games, and then take the "best" inputs
> and expect them to perform well in a network with lots of hidden nodes
> and extensive training?
I think the procedure you described will work, but it can't be the fastest way
to make progress. Surely you are better off using your expert judgment!
There are descriptions in the literature of constructive methods for
identifying neural network inputs. I haven't found any of them to be useful.
The method you are describing sounds a lot like coevolution or genetic
engineering of neural inputs. This may work, but you are certainly better on
your own.
> What's a good value for lambda?
Zero.
Theoretical models suggest that lambda should be 1/branching-factor, which
would be zero to 6%, depending on how you meansure branching factor. Anyway it
hardly matters because lambda==0 is a good approxximation to lambda==0.05.
> Does combining MC and TD training (ie. TD(lambda) at each step followed by
> a TD(1) on the terminal step) buy you anything? What about MC vs. TD
> in general?
TD is better. MC does not learn enough about intermediate stages.
Don't combine MC and TD. The impact of MC is simply to weaken training.
> Is using lookahead evaluations during training ever worth
> the expense?
The answer is no if you mean that you train each position on the 1-ply
lookahead value. Doing a 1-ply lookahead costs you a factor of 21 in training
speed, but you only gain about a factor of 5 in reduction of training noise
over TD. A clear loss.
However, if you reuse the 1-ply search value for at least 5 training
iterations, then you have a clear gain. In practice this is easy to do. But
then you are not using TD, but rather supervised training. I maintain that
supervised training is the right way to go as soon as you have a player of
expert caliber that you can use in a 1-ply search.
> It only takes 10 seconds to come up with another question,
> but days or weeks to answer it...
I can answer your questions in a day, since I have made all these mistakes
myself!
> > In fact, if you choose checker plays using rollouts from the weakest
> > neural network you achieve a skill level at least equal to 3-ply searching.
>
> What kind of positions does that apply to? In non-contact positions I
> believe you unconditionally. But in positions the net misplays (especially
> if it plays one side worse than the other), how much faith do you need to
> have in the original evaluator before you have reasonable trust in its
> rollouts?
Pretty much any position.
You need a very good checkerplayer to have confidence in the abolute equity
value returned by a rollout, but the same does not apply to checkerplay
decisions. The great majority of checkerplays are decided by near-term
phenomena that will be apparent in a rollout.
Warm regards,
Brian Sheppard
> > It only takes 10 seconds to come up with another question,
> > but days or weeks to answer it...
In article <6s3s0k$dph$1...@nnrp1.dejanews.com>,
bshe...@hasbro.com replied:
>
> I can answer your questions in a day, since I have made all these mistakes
> myself!
That's the spirit! As we have seen, Brian saved Gary months of
trial and error. Hopefully others will come out of their closets
with their questions/answers about neural network taining.
Or is it like in the financial market, where people sit tight on
the information, hoping to make the Big Scoop?
I'm not making a program, but I'm interested in the subject.
And I was surprised that including the pipcount didn't have an effect.
What about the relative pipcount, that's not a linear combination
of the raw board inputs?
Keep up the good work!
Stig Eide
It sure is! Thanks a million for posting, Brian... I've been reading
Usenet news for about 9 years and I can't remember ever learning as
much from a single article before. I appreciate your time and
expertise! (I guess this demonstrates that supervised training from
you can be considerably more efficient than my own zero-knowledge
learning ;-)
If anybody is interested, after 400,000 training games my original net
now scores 0.3 cubeless ppg against pubeval. Now I'm working on
producing screeds of rollout data from it to start some supervised
training.
Thanks again,
I believe that this is one of the biggest things we're going to learn
from Snowie. (Others have talked about this too.) Jellyfish rollouts
have been misleading in the past. For some positions with known cubeful
equity (bearoffs), one has to set a settlement limit much lower than
.550 on Jellyfish in order to get a level-5 rollout with a settlement to
match the known equity. All the "known" results from rollouts with a
settlement limit probably understated the equity for the taker.
No doubt five years from now we'll learn something new and discover that
something we all "know" right now turns out to be wrong too.
-Michael J. Zehr