Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

6 views
Skip to first unread message

Hendrik Baier

unread,
Jan 26, 2011, 4:00:47 AM1/26/11
to compu...@dvandva.org
Hi Aja,

I would be interested in your results. I think the LGRF policy is only a
small first step into the direction of more adaptive playouts (and
hopefully the overcoming of the horizon effect).
As for the Last-Bad-Reply idea, you can read about my experiences with
this and related policies in my Master's thesis, if you're interested.
It contains the idea that resulted in the "Power of Forgetting" paper as
well.
http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf

regards,
Hendrik

> I admit that it's difficult for me to include such deterministic default policy. :-)
> With softmax policy, using the information of "last-LOST-reply" is maybe a good direction.
>
> Aja

_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Aja

unread,
Jan 26, 2011, 8:13:53 AM1/26/11
to compu...@dvandva.org, hendri...@googlemail.com
Hi Hendrik,

Thanks.

Congratulations, you have done a really nice work. I check your thesis. My
result is consistent with yours of LBR-2. No benefit at all, so I took it
off. I adapt LGR-1 to softmax policy of Erica. Basically, I am tuning the
probability offset by checking some aritifical test-positions. In 3000
playouts, now it scores around 57% after 500 games, almost 60%, which is my
target (my intuition is LGR-1 should help a lot already). :)

Actually I have one question and still can't figure out your reasoning. In a
playout, why do you over-write the earlier replies by the later ones? Using
the earliest one looks more reasonable to me.

Aja

Hendrik Baier

unread,
Jan 26, 2011, 8:44:11 AM1/26/11
to Aja, compu...@dvandva.org
Hi Aja,

that's a good question. At least for the LGR policy without forgetting
(https://webdisk.lclark.edu/drake/publications/drake-icga-2009.pdf),
only using the first appearance of a reply did not significantly differ
in performance. A possible explanation could be that in cases where the
same move by the same player appears twice in a playout, the first stone
must have been captured, and therefore the answer to the second play is
the one that really influences the final position/result. I'm not sure I
repeated this experiment with LGRF, but I did try dismissing the tails
of playouts (with the rationale that there might be too much noise) and
ignoring stones that would later be captured (with the rationale that
those moves might be bad on average). Both variants were significantly
weaker than plain LGRF.
It's only a few lines of code, test it and see if it makes a difference
for your playout policy and program architecture. Stronger playout
policies than Orego's will have different interactions with LGRF. You
could even try saving several sets of replies per intersection, for the
first, second, third appearance of the previous move in a playout, in
the hope of capturing certain tactical situations with sacrifices. But I
don't expect much.

Hendrik

Aja

unread,
Jan 26, 2011, 10:42:04 AM1/26/11
to Hendrik Baier, compu...@dvandva.org
Hi Hendrik,

> that's a good question. At least for the LGR policy without forgetting
> (https://webdisk.lclark.edu/drake/publications/drake-icga-2009.pdf), only
> using the first appearance of a reply did not significantly differ in
> performance.

Thanks for your explanation. Yes, my experiment indicates that the
playing strengh is almost the same.

> It's only a few lines of code, test it and see if it makes a difference
> for your playout policy and program architecture. Stronger playout
> policies than Orego's will have different interactions with LGRF. You
> could even try saving several sets of replies per intersection, for the
> first, second, third appearance of the previous move in a playout, in the
> hope of capturing certain tactical situations with sacrifices. But I don't
> expect much.

Indeed. My plan is to generalize this scheme with more strict conditions.
Maybe we even can combine LGRF with the information of RAVE (inspired from
Arpad Rimmel's works). If the learning works well, it should fix a lot of
errors in my rules of the playout features. This might be a way to make the
playouts to learn how to play correct semeai moves.

Hendrik Baier

unread,
Jan 26, 2011, 10:57:15 AM1/26/11
to Aja, compu...@dvandva.org

> Indeed. My plan is to generalize this scheme with more strict
> conditions. Maybe we even can combine LGRF with the information of
> RAVE (inspired from Arpad Rimmel's works). If the learning works well,
> it should fix a lot of errors in my rules of the playout features.
> This might be a way to make the playouts to learn how to play correct
> semeai moves.
>
If you know in a playout that the best reply has to be one of only a
handful of options (e.g. in a semeai), non-zero probabilities for all of
those moves plus LGRF should be a nice way of making the search adapt.
But whenever you make the conditions more specific, be aware that you
will get fewer samples per condition, and you need lots of samples to be
able to count on good replies staying in the table longer than bad replies.

If you could find an efficient way of combining LGRF with local
patterns, I would see some potential. I couldn't get it to run fast enough.

Aja

unread,
Jan 26, 2011, 11:11:38 AM1/26/11
to Hendrik Baier, compu...@dvandva.org
> If you know in a playout that the best reply has to be one of only a
> handful of options (e.g. in a semeai), non-zero probabilities for all of
> those moves plus LGRF should be a nice way of making the search adapt. But
> whenever you make the conditions more specific, be aware that you will get
> fewer samples per condition, and you need lots of samples to be able to
> count on good replies staying in the table longer than bad replies.

OK, got it. thanks.

> If you could find an efficient way of combining LGRF with local patterns,
> I would see some potential. I couldn't get it to run fast enough.

Yes, I can. In Erica, I can incrementally update patterns up to size 5.
Combining LGRF with local patterns is in my list to try tomorrow. Thanks for
your informative suggestions.

Aja

"Ingo Althöfer"

unread,
Jan 26, 2011, 12:44:41 PM1/26/11
to compu...@dvandva.org
Hello Hendrik,

thanks for sharing the link to your master thesis.

One question: Does PD Dr. Ulf Lorenz know about your work?

He is working at TU Darmstadt, and has a project on
computer go (with Ph.D. student Lars Schaefers).

Cheeers, Ingo

-------- Original-Nachricht --------
> Datum: Wed, 26 Jan 2011 10:00:47 +0100
> Von: Hendrik Baier <hendri...@googlemail.com>
> An: compu...@dvandva.org
> Betreff: Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

--
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

Reply all
Reply to author
Forward
0 new messages