[Computer-go] A Regression test set for exploring some limitations of current MCTS programs in Go

51 views
Skip to first unread message

Aja Huang

unread,
May 14, 2012, 10:39:31 PM5/14/12
to compu...@dvandva.org
Dear all,

Martin Mueller and I are writing a paper about exploring some limitations of current MCTS programs in Go. For this purpose we have carefully designed a regression test set which consists of 20 seki and 15 two-safe-groups cases on 9x9 board. If you are interested, it is available at

http://webdocs.cs.ualberta.ca/~mmueller/ps/seki-and-two-safe-groups-regression-test.zip

We will appreciate if you would like to run your program over our regression test and send us the results for our publication.

It's easy to run your program through these positions (.sgf). Mainly, the script run.sh under /utility is able to run a given program for a given regression test file (.tst) and produce the result in a related html file. For example, for the seki test you can simply type

./run.sh -p PATH_TO_PROGRAM -t g_seki_moves.tst

Some notes:
1. Your program must support the command sg_compare_float for the two-safe-groups test. If it doesn't support reg_genmove then the test file g_seki_moves.tst is good to use which performs genmove instead.

2. On Windows platform, you will be able to execute 'run.sh' directly at the command prompt after cygwin is installed.

3. If your program doesn't support the GTP command 'loadsgf', gogui-adapter is able to translate 'loadsgf' into a sequence of 'play' commands. The file gogui-adapter.jar under /utility is good to use because Markus has fixed some bugs for us, see

https://sourceforge.net/tracker/?func=detail&aid=3522401&group_id=59117&atid=489964
https://sourceforge.net/tracker/?func=detail&aid=3519829&group_id=59117&atid=489964

Under /experimental results, there are results of several programs such as Fuego (tilburg version), pachi, ManyFaces and GnuGo. We thank David for providing us the valuable results of ManyFaces. The test set is really not easy because these programs all failed in many cases.

Questions are very welcome. If you find any error in the test set please inform us. Thanks.

Best regards,
Aja

Lars Schäfers

unread,
May 15, 2012, 6:34:32 AM5/15/12
to compu...@dvandva.org
Aja,

thanks for the regression test set. I will make a run with Gomorra.

Can you give some details to the sg_compare_float command? What are the
parameters, what should it return..

As I couldn't find something in the mail I guess there are no
restrictions in terms of time or number of simulations to use.


Best wishes,
Lars
_______________________________________________
Computer-go mailing list
Compu...@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Brian Sheppard

unread,
May 15, 2012, 6:49:38 AM5/15/12
to compu...@dvandva.org

By what date would you like a response?

Aja Huang

unread,
May 15, 2012, 3:28:22 PM5/15/12
to compu...@dvandva.org
We plan to submit the paper to CG2012, though there haven't been any official announcements. It would be good for us if you could contribute the results of your 'untuned' program before mid-June.

The following explanation provides more details of the regression test.

1. The 20 seki cases are for testing White's 'correct answer'. A case may have more than one answer. Correct answers lead to White's win while other wrong answers result in Black's win. Note that in some positions komi is set to 7.0.

2. The 15 two-safe-groups cases are for testing Black's 'evaluation', all should be 0% because White wins by living two groups. Every case has another two relaxed versions, each makes one of White's two groups 100% alive (assume the program doesn't fill a real eye in the playout). Take for instance case1. case1-1 makes White's bottom group 100% alive; case1-2.sgf secures the top group. So case1-1 and case1-2 are supposed to be easier than case1 itself. To assess program's evaluation, 'sg_compare_float' is used. For example,

sg_compare_float 0.4 uct_value

If uct_value (the winning rate of the search) is smaller than 0.4, return -1; 1 otherwise.

Please run your program over the test in terms of number of simulations:1k, 2k, 4k, 8k until 128k simulations. For convenience, you can write a simple script like

run.sh -p "PATH_TO_PROGRAM -playouts 1000" -t g_seki_moves.tst
rename html\g_seki_moves.tst\index.html g_seki_moves_1k.html

run.sh -p "PATH_TO_PROGRAM -playouts 2000" -t g_seki_moves.tst
rename html\g_seki_moves.tst\index.html g_seki_moves_2k.html

run.sh -p "PATH_TO_PROGRAM -playouts 4000" -t g_seki_moves.tst
rename html\g_seki_moves.tst\index.html g_seki_moves_4k.html

...

run.sh -p "PATH_TO_PROGRAM -playouts 128000" -t g_seki_moves.tst
rename html\g_seki_moves.tst\index.html g_seki_moves_128k.html


Best regards,
Aja


Jacques Basaldúa

unread,
May 16, 2012, 3:53:53 PM5/16/12
to compu...@dvandva.org

Hi Aja,

 

The testing program codes different problems in the same sgf file

 

like in:

 

loadsgf sgf/seki/case1.sgf 4

14 genmove w

#? [B2|J3]

 

loadsgf sgf/seki/case1.sgf 6

16 genmove w

#? [B2]

 

If you ignore the move numbers, j3 is not even a legal move. Unfortunately, move numbers hardly mean anything since the sgf file is not a game, but a list of stones. Each program will translate that its own way and get different move numbers, possibly alternating B,W,B,W.. or whatever.

 

I also, don't know what the numbers 4 and 6 mean at the end of the loadsgf command.

 

Can you please provide a list of the last moves played before the "genmove" so we can verify that we are all analyzing the same position? Ideally, I would prefer a simple sgf file without "tricks" representing the tested position, but assuming that this position is reachable by just removing the last move a number of times, I can produce the SGF file myself. I would be happy to participate in your test.

 

Jacques.

Aja Huang

unread,
May 16, 2012, 11:12:34 PM5/16/12
to compu...@dvandva.org
Hi Jacques,

We will appreciate very much if you could participate in our test. In the specification of GTP, about the command 'loadsgf' it says

Board size and komi are set to the values given in the sgf file. Board configuration, number of captured stones, and move history are found by replaying the game record up to the position before move_number or until the end if omitted.

So for the command

loadsgf sgf/seki/case1.sgf 4


The program should load the position of case1.sgf BEFORE move 4, not AFTER. Just today some author found a bug of gogui-adapter and kindly reported to me: gogui-adapter incorrectly loads the position AFTER move_number. Markus has already fixed the bug for us, see


https://sourceforge.net/tracker/?func=detail&aid=3527339&group_id=59117&atid=489964


If you use gogui-adapter to translate 'loadsgf' for your program, please download the newest version of gogui which is available at


https://sourceforge.net/scm/?type=git&group_id=59117


Best regards,

Aja



Aja Huang

unread,
May 16, 2012, 11:53:11 PM5/16/12
to compu...@dvandva.org
By the way, to use gogui-adapter to translate 'loadsgf' the command is something like

./run.sh -p "java -jar gogui-adapter.jar  \"PATH_TO_PROGRAM \"" -t g_seki_moves.tst
(use backslash character (\) to escape the quotes in the string)

I used gogui-adapter to run pachi and Mogo as well because they both don't support 'loadsgf'. Please don't hesitate to let me know if it doesn't work for you.

Best regards,
Aja

David Fotland

unread,
May 17, 2012, 12:42:34 AM5/17/12
to compu...@dvandva.org

I used gogui-adapter too because many faces doesn’t have loadsgf, but gogui doesn’t send the komi, so I had to adjust it by hand.

Aja Huang

unread,
May 17, 2012, 2:09:46 AM5/17/12
to compu...@dvandva.org
That bug has already been fixed by Markus. You can download the newest gogui-adapter at

http://smart-games.com/gogui-adapter.jar

Markus fixed 3 bugs regarding 'loadsgf':
1. Adapter keeps sending 'undo' even if the attached engine replies 'cannot undo'.
2. Adapter doesn't send 'komi'.
3. Adapter incorrectly loads the position AFTER move_number.

Thanks David for hosting the file.

Best regards,
Aja

2012/5/16 David Fotland <fot...@smart-games.com>

Jacques Basaldúa

unread,
May 17, 2012, 7:19:31 AM5/17/12
to compu...@dvandva.org

Ok.

 

I will use gogui-adapter to do the same. I could have converted the script to my own system, but my support of loadsgf only supports loading the principal variation to the end. If gogui-adapter translates the files into the appropriate "play" commands, it is Ok for me.

 

Jacques.

Rémi Coulom

unread,
May 17, 2012, 7:41:27 AM5/17/12
to compu...@dvandva.org
Hi Aja,

Thanks for this interesting test. This is Crazy Stone's output for seki_moves:
http://www.grappa.univ-lille3.fr/~coulom/seki-128k.html
Many correct answers are probably a bit lucky, because the evaluation is rarely correct.

It is not easy for me to implement sg_compare_float. But I'll try if I find time.

gnugo rules :-)

Rémi

Olivier Teytaud

unread,
May 17, 2012, 9:21:01 AM5/17/12
to compu...@dvandva.org
If you run tests twice, you get nearly the same results ?
Aja: you'll publish results with varying numbers of simulations for MC bots ?
Olivier

2012/5/17 Rémi Coulom <Remi....@free.fr>



--
=========================================================
Olivier Teytaud -- olivier...@inria.fr
TAO, LRI, UMR 8623(CNRS - Universite Paris-Sud),
bat 490 Universite Paris-Sud F-91405 Orsay Cedex France http://0z.fr/EJm0g
(one of the 56.5 % of french who did not vote for Sarkozy in 2007)


Rémi Coulom

unread,
May 17, 2012, 10:09:04 AM5/17/12
to compu...@dvandva.org
Sorry, I have just figured out that my loadsgf command did not set the komi correctly. Now that it is fixed, the result is much better:
http://www.grappa.univ-lille3.fr/~coulom/seki-1k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-2k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-4k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-8k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-16k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-32k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-64k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-128k.html
http://www.grappa.univ-lille3.fr/~coulom/seki-1024k.html

So only 3 errors at slow time control. I can't tell for sure they are really errors.

The evaluation for ID 111 & 115 is losing. The evaluation for ID 191 is jigo.

All the other (the correct ones) are either jigo or winning, except for case17 (ID 172), that is a very funny kind of seki, and that is evaluated as bad for W, although CS plays the game correctly.

Rémi

Rémi Coulom

unread,
May 17, 2012, 10:53:39 AM5/17/12
to compu...@dvandva.org
Now with the correct e-mail address.

On 17 mai 2012, at 16:43, Rémi Coulom wrote:

> I took a closer look at the games.
>
> 19 is hanezeki:
> http://senseis.xmp.net/?Hanezeki
> I don't worry too much about that. Did this ever occur in a real game?
>
> I would recommend using non-integer komi for your tests, because they test the ability of the program to deal with jigo at the same time as they test seki. Dealing with jigo in the search is not an easy job: it is much more difficult to get a consistent search, with proved convergence to optimal play, when the outcome of the game is not binary. Completely greedy search will solve any position with non-integer komi, but it is likely to fail with integer komi (ie, get stuck on jigo when a stronger move can win but has a low evaluation in the beginning of the search).
>
> Crazy Stone evaluates hanezeki correctly if komi is set to 7.5 instead of 7.0.

Sorry, that should be 6.5. With 6.5, Crazy Stone still fails. So hanezeki is still difficult.

Rémi

>
> case11 is strange. In the variation contained in the sgf, W loses by two points. Aja, are you sure case11 is correct?
>
> Rémi

Aja Huang

unread,
May 17, 2012, 2:18:52 PM5/17/12
to compu...@dvandva.org
Hi Olivier,

Yes that's our plan. We will appreciate very much if you could participate in our regression test and contribute Mogo's results. It will be interesting to see Mogo's performance of these test cases on large simulations like 1M, 2M, 4M or even 32M over a mega cluster/strong machine.

The version of Mogo I ran over the test was downloaded at

http://www.lri.fr/~teytaud/mogor

It's probably not a current version and I couldn't figure out how to get Mogo's evaluation of a position.

Best regards,
Aja

2012/5/17 Olivier Teytaud <olivier...@lri.fr>

Aja Huang

unread,
May 17, 2012, 2:39:53 PM5/17/12
to compu...@dvandva.org
Hi Rémi,

Yes, you are right. Case11 is not correct. I have fixed it. Case19 is Hanezeki that might never occur in real games. The purpose of this search is to explore some limitations of current MC Go programs so Martin asked me to design the most difficult seki cases on the earth. Then I just did it.

As for komi 7.0, thanks for your suggestion. We will discuss it and announce our decision.

Best regards,
Aja

2012/5/17 Rémi Coulom <Remi....@free.fr>

Rémi Coulom

unread,
May 17, 2012, 2:46:18 PM5/17/12
to compu...@dvandva.org
Also you should check some relatively easy cases where Crazy Stone fails at 128k and 64k. I would not be surprised if the moves suggested by Crazy Stone are correct.

What about the result of Erica? I believe it should be similar to Crazy Stone :-)

Rémi

Rémi Coulom

unread,
May 17, 2012, 3:01:24 PM5/17/12
to compu...@dvandva.org
On 17 mai 2012, at 20:39, Aja Huang wrote:

> Hi Rémi,
>
> Yes, you are right. Case11 is not correct. I have fixed it. Case19 is Hanezeki that might never occur in real games. The purpose of this search is to explore some limitations of current MC Go programs so Martin asked me to design the most difficult seki cases on the earth. Then I just did it.
>
> As for komi 7.0, thanks for your suggestion. We will discuss it and announce our decision.
>
> Best regards,
> Aja


You'll find in attachment an interesting case of seki that maybe you don't have in your database. The White string in A11 has 3 liberties, but W must not play in any of them, because then A8 and A10 are miai for kill. Black can play in any of them, but search will not play there, because that would make W obviously alive. That position occured in a game that Crazy Stone lost against gnugo.

Rémi


seki.sgf

Aja Huang

unread,
May 17, 2012, 3:05:46 PM5/17/12
to compu...@dvandva.org
I will certainly check all the cases again. Thanks to Yamato and Erik's contributions, both Zen and Steenvreter solve all the seki cases so probably other cases are fine. At this moment I'm too busy to clean up Erica's code and test it. I'll probably do it later.

Aja Huang

unread,
May 17, 2012, 4:03:34 PM5/17/12
to compu...@dvandva.org
Thanks, it is indeed a very interesting seki. In case13 the seki at the bottom-left corner is also formed in a big eye but of a different shape.

Aja


You'll find in attachment an interesting case of seki that maybe you don't have in your database. The White string in A11 has 3 liberties, but W must not play in any of them, because then A8 and A10 are miai for kill. Black can play in any of them, but search will not play there, because that would make W obviously alive. That position occured in a game that Crazy Stone lost against gnugo.

Rémi



Hiroshi Yamashita

unread,
May 21, 2012, 7:13:14 AM5/21/12
to compu...@dvandva.org
Hi Aja,

Thanks for the interesting seki problems.
Aya's result are

http://www.yss-aya.com/g_seki_moves_1k.html
http://www.yss-aya.com/g_seki_moves_2k.html
http://www.yss-aya.com/g_seki_moves_4k.html
http://www.yss-aya.com/g_seki_moves_8k.html
http://www.yss-aya.com/g_seki_moves_16k.html
http://www.yss-aya.com/g_seki_moves_32k.html
http://www.yss-aya.com/g_seki_moves_64k.html
http://www.yss-aya.com/g_seki_moves_128k.html

I used latest case11.sgf

Regards,
Hiroshi Yamashita


----- Original Message -----
From: "Aja Huang" <ajah...@gmail.com>
To: <compu...@dvandva.org>
Sent: Saturday, May 19, 2012 7:14 AM
Subject: Re: [Computer-go] A Regression test set for exploring some limitations of current MCTS programs in Go


> Dear all,
>
> If you are interested, you can download our latest regression test set at
>
> http://webdocs.cs.ualberta.ca/~shihchie/seki-and-two-safe-groups-regression-test.zip
>
> which was updated with
> 1. Newest, bug-free gogui-adapter.jar.
> 2. Fixed case11.sgf of the seki test set.
> 3. genmove version of the test files prefixed by g_.
>
> We appreciate that not a few authors are interested to participate in our
> test. Thanks Erik, Yamato and Remi for helping us check and point out the
> errors in the test set. We will release the final, bug-free version as soon
> as possible.
>
> Best regards,
> Aja
>


--------------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages