[Computer-go] learning patterns for mc go

35 views
Skip to first unread message

Hendrik Baier

unread,
Apr 26, 2010, 5:53:48 PM4/26/10
to compu...@dvandva.org
Hello list,

I am a Master's student currently working on his thesis about certain
aspects of Monte-Carlo Go. I would like to pose a question concerning
the literature - I hope some of you can help me out!

My problem is that I can't find many papers about learning of MC playout
policies, in particular patterns. A lot of programs seem to be using
Mogo's 3x3 patterns, which have been handcoded, or some variation
thereof. A lot of people have tried some form of pattern learning, but
mostly to directly predict expert moves it seems, not explicitly
optimizing the patterns for their function in an MC playout policy.
Actually, I am only aware of "Computing Elo Ratings of Move Patterns in
the Game of Go", where patterns have been learned from pro moves, but
then also successfully used in an MC playout policy; and "Monte Carlo
Simulation Balancing".

Considering the huge impact local patterns have had on the success of MC
programs, I would have expected more attention towards automatically
learning and weighting them specifically for MC playouts. There is no
reason why patterns which are good for predicting experts should also be
good for guaranteeing diverse, balanced playout distributions. Have I
missed something?

Or how did your program come to its patterns? I'd be interested. Did you
maybe even try learning something else than patterns for your playout
policy?

cheers,
Hendrik
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

--
You received this message because you are subscribed to the Google Groups "Computer Go Archive" group.
To post to this group, send email to computer-...@googlegroups.com.
To unsubscribe from this group, send email to computer-go-arc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/computer-go-archive?hl=en.

Mark Boon

unread,
Apr 26, 2010, 8:03:03 PM4/26/10
to compu...@dvandva.org
For a bit of a secondary project I've been doing the past few weeks I
made a self-learning Tetris program. To my own surprise this worked
extremely well. It 'grows' an expert player in just a few thousand
generations (a few days real time) starting from zero knowledge . And
it doesn't have any 'a priori' knowledge about the game, so it adjusts
when the (scoring) rules or playing conditions change. It works so
well, I was thinking this could very well apply to computer-Go
somehow.

Tetris is not Go of course. And also it's a single-player game, where
Go is a two-player game. So some adjustments about determining the
survivors and procreation may have to be made. It's a bit early for me
to divulge more details at this point. Also, this is research which my
current employer may not want to become public in great detail just
yet.

But based on my findings so far I can only encourage anyone who thinks
about making a self-learning Go program to go ahead and try.

Mark

Daniel Burgos

unread,
Apr 27, 2010, 3:14:29 AM4/27/10
to compu...@dvandva.org
I've tried several types of self-learning programs and I think the key point is that go is not a single-player game (taking apart the degrees of freedom the game has). Your algorithm has to learn how to model the enemy behaviour during the game and try to figure out what the next movements will be.

In any case, I will continue on this line because what I want is not an algorithm to play go (something quite simple, isn't it? ;-)) but an algorithm able to learn go by itself.

2010/4/27 Mark Boon <tesujis...@gmail.com>

Hendrik Baier

unread,
Apr 27, 2010, 8:06:27 AM4/27/10
to compu...@dvandva.org
To clarify: I'm not looking for a program that learns to play go in a
very general sense. I'm looking for principled approaches to finding the
optimal pattern set/pattern weights for use in Monte Carlo playouts, or
to optimizing Monte Carlo playouts in some other way - similar to how it
was done in "Monte Carlo Simulation Balancing". Are there no more papers?

Hendrik
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Petr Baudis

unread,
Apr 27, 2010, 9:39:38 AM4/27/10
to compu...@dvandva.org
Hi!

On Mon, Apr 26, 2010 at 11:53:48PM +0200, Hendrik Baier wrote:
> My problem is that I can't find many papers about learning of MC
> playout policies, in particular patterns. A lot of programs seem to
> be using Mogo's 3x3 patterns, which have been handcoded, or some
> variation thereof. A lot of people have tried some form of pattern
> learning, but mostly to directly predict expert moves it seems, not
> explicitly optimizing the patterns for their function in an MC
> playout policy. Actually, I am only aware of "Computing Elo Ratings
> of Move Patterns in the Game of Go", where patterns have been
> learned from pro moves, but then also successfully used in an MC
> playout policy; and "Monte Carlo Simulation Balancing".

I'm not aware of any research in this direction other than the
(Silver and Tesauro, 2009) paper:

http://conflate.net/icml/paper/2009/500

I think David Silver implied he is continuing research in this
direction, but I'm not sure at all.

> Considering the huge impact local patterns have had on the success
> of MC programs, I would have expected more attention towards
> automatically learning and weighting them specifically for MC
> playouts. There is no reason why patterns which are good for
> predicting experts should also be good for guaranteeing diverse,
> balanced playout distributions. Have I missed something?
>
> Or how did your program come to its patterns? I'd be interested. Did
> you maybe even try learning something else than patterns for your
> playout policy?

An important point is that the move selection in playouts is (I think
in all implementations?) always randomized, that is even in case of
matching pattern there is small chance it won't be played. This usually
manages to spread out simulations reasonably evenly. My gut feeling is
that MC-specific patterns won't give a large boost, but of course I can
be wrong.

I think most researchers are focused on integrating MCTS with other
evaluation functions, or adding more expert knowledge. I personally
think the best direction is simulation learning from the tree search,
among other things.

--
Petr "Pasky" Baudis
When I feel like exercising, I just lie down until the feeling
goes away. -- xed_over

Jim O'Flaherty, Jr.

unread,
Apr 27, 2010, 11:19:13 AM4/27/10
to compu...@dvandva.org
Mark,

Very cool that you were able to enjoy seeing an expert Tetris player emerge from your experiment. It sounds fun!

I worked on this very idea for Go about 5 years ago. I worked it pretty strongly, too. Unfortunately, I found it to be quite difficult and not very rewarding. There are so many issues with developing "strategies" that persist from generation to generation. Consider a human individual is both the genetics of the instance (phenotype) and the emergent effect of the accretions through high dynamic adaptations in the environment (unique to the individual's experience) combined with a lossy persistence system (dynamic and highly adaptive constructed/destructed rule system). These three things combine to make an individual operate.

I found that attempting to isolate any part of this to be a very unsuccessful tangent. And then finding ways to persist the isolated parts in a way that was adaptable to be way more difficult. I am not saying all three are required to be modeled to generate a Go player. I am saying, though, there is a unique balance in homo-sapien's emergent design that is going to take a VERY long time to grok and then model in software such that a highly capable Go player results. For now, I am betting that there will be vastly more progress on the pseudo-AI fronts like MC than on the actual attempts to roughly model human rules based cognition in variations of a genetic algorithm.


Jim



From: Mark Boon <tesujis...@gmail.com>
To: compu...@dvandva.org
Sent: Mon, April 26, 2010 7:03:03 PM
Subject: Re: [Computer-go] learning patterns for mc go

Olivier Teytaud

unread,
Apr 27, 2010, 11:54:33 AM4/27/10
to compu...@dvandva.org
My problem is that I can't find many papers about learning of MC playout policies, in particular patterns.

A just published paper about learning MC policies:
http://hal.inria.fr/inria-00456422/fr/
It works quite well for Havannah (not tested on hex I think).
 
But in the case of Go, the Wang-policy is too strong for being improved like that.

Incidentally, the big improvements when compared to Yizao Wang's policy in mogo are:

- fill board (David, Martin, I hope that now it works well for you - for us it's extremely efficient for long computation
    times in 19x19 (but not for short time settings!), i.e. randomization in the Monte-Carlo part for
     "jumping" to empty parts of the goban)

- nakade

(there are other improvements, but much smaller: e.g. approach moves)

(fill board and nakade in http://hal.inria.fr/inria-00386477/)

Best regards,
Olivier

David Doshay

unread,
Apr 27, 2010, 1:28:07 PM4/27/10
to compu...@dvandva.org
Because of my background in physics, where I did my thesis work using a very large MC simulation to study phase transitions, I have the expectation that MC simulations should work much better when properly biased. In physics, we have a theoretic basis for the bias, often from Boltzman or other appropriate statistics. Unfortunately, not so in Go.

Here is a pointer to all of our work on computer go:

http://users.soe.ucsc.edu/~charlie/projects/SlugGo/

The papers to look at are the last 3 at that link, all from Jen Flynn. Her project explored that question from a sampling perspective:

• Here are three unpublished quarterly reports from Jennifer Flynn.
• SlugGo Summer 07 Report: Predicting the Winner of 9x9 Computer Go Games using Patterns as Evaluators
• SlugGo Winter 08 Report: Using 5x5 Tiles with Libego
• SlugGo Spring 08 Report: Using Patterns in Libego Playouts

Unfortunately, we did not get anything we could call a positive result.

Cheers,
David

Hendrik Baier

unread,
Apr 27, 2010, 1:34:04 PM4/27/10
to compu...@dvandva.org
Thank you Olivier, that is a very interesting paper and exactly what I
was looking for. I believe there are indeed possibilities for relatively
domain-independent playout improvements like yours, also in Go.
One question: If you use, in cmc(s), always the argmax over all tiles
including your *last* move - shouldn't you then also, in the update
step, only update the tiles of moves directly following each other in
the playout? As far as I understand your paper, you update all tiles of
all possible move pairs from the playout.

Best,
Hendrik
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Gian-Carlo Pascutto

unread,
Apr 27, 2010, 5:25:19 PM4/27/10
to compu...@dvandva.org
Hendrik Baier wrote:

> Considering the huge impact local patterns have had on the success of MC
> programs, I would have expected more attention towards automatically
> learning and weighting them specifically for MC playouts. There is no
> reason why patterns which are good for predicting experts should also be
> good for guaranteeing diverse, balanced playout distributions. Have I
> missed something?
>
> Or how did your program come to its patterns? I'd be interested. Did you
> maybe even try learning something else than patterns for your playout
> policy?

You're asking the right questions, but I'm afraid it will be up to you
to provide the right answers.

I spent significant amounts of time on this topic, with relatively very
few success. The only thing I managed to show is that the relation
between the playouts predictive power for the game result is not well
correlated to playing strength of the resulting program (this was
implied in a Mogo paper).

Without a good fast measurement to check the effectiveness of a playout,
the number of relatively fast optimization opportunities goes down quickly.

Basically, I agree this is a key issue, but we don't understand a damned
thing about it yet, and the published results so far just suck.

--
GCP

Petr Baudis

unread,
May 6, 2010, 10:45:25 AM5/6/10
to compu...@dvandva.org
On Tue, Apr 27, 2010 at 05:54:33PM +0200, Olivier Teytaud wrote:
> > My problem is that I can't find many papers about learning of MC playout
> > policies, in particular patterns.
> >
>
> A just published paper about learning MC policies:
> http://hal.inria.fr/inria-00456422/fr/
> It works quite well for Havannah (not tested on hex I think).

Very interesting!

Why have you restricted the tiling to the actions being both performed
by the same player? This seems to give "if I played X before, following
up by Y will be good", but wouldn't be "if opponent played X before,
replying by Y will be good" at least as useful? Have you considered also
discouraging replies that give very bad results?

One thing I have hit when trying to implement something like this is
that minimax prunes a lot of interesting situations - if sequence A-B-C
is good, minimax will quickly redirect to less good A-X-C even if in
simulations, B is very likely to be played.

> But in the case of Go, the Wang-policy is too strong for being improved like
> that.

Does this imply that you have tried to implement it but weren't
successful, or is this just a feeling?

> (fill board and nakade in http://hal.inria.fr/inria-00386477/)

Thanks, and I have thought I know about all the recent computer-go
papers... :-)

--
Petr "Pasky" Baudis
When I feel like exercising, I just lie down until the feeling
goes away. -- xed_over

Darren Cook

unread,
May 19, 2010, 7:50:31 PM5/19/10
to compu...@dvandva.org
>> My problem is that I can't find many papers about learning of MC playout
>> policies, in particular patterns.
>
> A just published paper about learning MC policies:
> http://hal.inria.fr/inria-00456422/fr/
> It works quite well for Havannah (not tested on hex I think).

I struggled with this paper ("Multiple Overlapping Tiles for Contextual
Monte Carlo Tree Search"), as it wasn't clear to me what a "tile" was.
Specifically I couldn't work out if they were 2d patterns of
black/white/empty, or are they are a sequence of moves (e.g. joseki,
forcing moves, endgame sente/gote sequences, etc. in go)? Or perhaps
something else altogether?

While I wear the dunce's cap and stand in the corner, is some kind soul
able to explain the idea in go terms?

Thanks,
Darren


--
Darren Cook, Software Researcher/Developer

http://dcook.org/gobet/ (Shodan Go Bet - who will win?)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

Jason House

unread,
May 19, 2010, 11:13:15 PM5/19/10
to compu...@dvandva.org
I mostly skimmed it, but here's what I got from it: In a simulation,
pick moves based off the leaf node's RAVE values, but discount moves
whose follow-up moves have already been taken.

The tiling is simply a tracking of how effect a move is when combined
with a specific follow-up move. Near the start of a simulation, this
would match RAVE values. Deep in a simulation, it's highly situational
and based on which follow-up moves remain open.

I hope that helps!

Sent from my iPhone

Darren Cook

unread,
May 20, 2010, 7:39:17 PM5/20/10
to compu...@dvandva.org
> The tiling is simply a tracking of how effect a move is when combined
> with a specific follow-up move. Near the start of a simulation, this
> would match RAVE values. Deep in a simulation, it's highly situational
> and based on which follow-up moves remain open.
>
> I hope that helps!

Almost :-) It is still a bit fuzzy, but I think I'll wait for another
paper on the subject before trying to apply any more brain power to it.

Just skimming it again, their 57% win rate does not seem to allow for
the extra CPU work required for doing the tiling. That is reasonable -
you don't want people to waste time optimizing before they publish
algorithms - but it does suggest it may turn out to be no better use of
your CPU cycles than simply doing more dumb playouts.

Darren


>>> A just published paper about learning MC policies:
>>> http://hal.inria.fr/inria-00456422/fr/
>>> It works quite well for Havannah (not tested on hex I think).
>>
>> I struggled with this paper ("Multiple Overlapping Tiles for Contextual
>> Monte Carlo Tree Search"), as it wasn't clear to me what a "tile" was.
>> Specifically I couldn't work out if they were 2d patterns of
>> black/white/empty, or are they are a sequence of moves (e.g. joseki,
>> forcing moves, endgame sente/gote sequences, etc. in go)? Or perhaps
>> something else altogether?
>>
>> While I wear the dunce's cap and stand in the corner, is some kind soul
>> able to explain the idea in go terms?


--
Darren Cook, Software Researcher/Developer

http://dcook.org/gobet/ (Shodan Go Bet - who will win?)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply all
Reply to author
Forward
0 new messages