Human vs AI handicap go

Warren D Smith

unread,

Dec 19, 2019, 6:08:02 PM12/19/19

to LCZero

Go is better than chess in the sense unequal players (within limits) can still have

a good game via "handicap stones."

(In chess you can use "odds games" but that isn't as good.)

So anyhow, Lee Sedol, the 4-1 victim of DeepMind's original go AI, just played an

exhibition game versus "HanDol," another go AI which I think is an attempt to commercialize alphazero type

technology, and which previously had been undefeated by humans... some claim HanDol is stronger than

or comparable to the DeepMind "master" go AI.

Lee Sedol was given a handicap stone. i.e. 1 extra move at gamestart, the smallest handicap normally used.

That (unexpectedly) proved to be enough; Lee Sedol won the game.

This is the first game in a planned 3-game match, and in the second game, in view of LS's victory,

there will be no handicap. Match 18 & 19 Dec 2019.

http://www.koreaherald.com/view.php?ud=20191218000277

https://pulsenews.co.kr/view.php?year=2019&no=1014122

Warren D Smith

unread,

Dec 19, 2019, 6:38:15 PM12/19/19

to LCZero

earlier news report:

"The 2019 China Securities Cup World AI Open, a tournament to decide the world’s top go-playing computer program, was

held in Rizhao City in Shandong Province, China, from August 21 to 25.

Fourteen programs from China (8), Japan (1), Korea (2), Chinese Taipei (1), Hong Kong (1), and Belgium (1) took part.

Fine Art (China) showed overwhelming strength, beating Golaxy (also China) 4-1 in the final.

Third place went to HanDol of Korea.

Fourth to Leela Zero of Belgium.

Japan had high hopes for Globis-AQZ, but after coming third in the first section of the tournament, it was beaten into fifth place in the knock-out stage.

This tournament was just one part of a large-scale go festival with various kinds of tournaments for amateurs and professionals.

The AI tournament was in its third year.

DeepZenGO of Japan won the first, and Golaxy of China the second."

Fine Art also won the Tencent World AI Weiqi Competition 2018, beating Golaxy 7-0 in the final.

----

So HandDol clearly is not the top go AI, but it apparently is superior to leela zero and is the best go AI from Korea.

"Fine Art," developed by TenCent, seems to be the strongest and winning 4-1 and 7-0 versus the 2nd strongest seems reasonably convincing.

One innovation versus the original DeepMind design are neural nets which have more than one evaluation and suggested move

output, depending on the "komi." This allows it (a) to play with different komi (Alphazero could only play with one)

(b) seems to increase its strength.

An apparent weakness in alphazero had been it valued all wins the same, regardless of the final score (i.e. territory count) in

the go game. This would cause it to give up territory for no good reason. The komi-innovation above can repair that perceived

or real flaw.

Warren D Smith

unread,

Dec 19, 2019, 6:52:17 PM12/19/19

to LCZero

> One innovation versus the original DeepMind design are neural nets which
> have more than one evaluation and suggested move
> output, depending on the "komi." This allows it (a) to play with different
> komi (Alphazero could only play with one)
> (b) seems to increase its strength.

So for example, a weakness in all monte carlo players seems to be, if
you are in a game position with 99% win probability, (or 1%), then you
get little info from a monte carlo search to game-end. Only 2% or so
of the monte carlo playouts provide useful information.

Versus, if it were a 50-50 position, every playout gives useful info.

The multi-komi trick enables you to use a fake komi to bring the fake
win-chance back near
50-50 again, and then you need not suffer from this whole problem.
You might worry that thinking about the wrong game in this way is
going to hurt you... but getting only 2% info also is going to hurt
you, so it is a matter of choosing the best tradeoff, which can be
done based on prior experiments.

--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)

Norton Freeman

unread,

Dec 19, 2019, 11:47:23 PM12/19/19

to LCZero

HandDol is not superior to leela zero, they just had one match and HandDol was playing white

Warren D Smith

unread,

Dec 20, 2019, 12:13:57 PM12/20/19

to LCZero

UPDATE, Lee Sedol lost game #2. The third and final game will be tomorrow, I assume again with

a handicap stone. And actually it seems they were using BOTH a handicap stone and a komi (unusually),

so it effectively was a smaller handicap than the usual 1 stone handicap

for Lee Sedol, but still it was enough for him to win game #1.

--I just had another idea which is a much closer analogy to the "fake komi" idea that

was highly successful in neural net go. Two papers on two versions of that idea:

I. https://arxiv.org/abs/1705.10701 "In the ML value network, different values (win rates) are trained simultaneously for different settings of komi."

II. https://arxiv.org/abs/1809.03928 "the winrate for all komi values is obtained, at the price of predicting just one more variable."

Here is the chess version of that. Make lc0's neural net output, not only win prob (conditioned on not draw),

draw prob, and possible-move probs, but ALSO same for different values of the "number of moves left until 50draw"

counter, anywhere from say 5 to 500 (that is like paper I).

Or, make network provide these outputs in the form of parameters describing a FUNCTION of that counter

(that is like paper II).

Now, when playing, we can employ a fake value of the "moves left til 50" counter. It is possible

that by tuning the fakeness right, the net result will be stronger play.

Anyhow, this idea definitely works (and allegedly the top go AI's all now use it and it

provides a substantial strength boost in go) using "fake komis."

It also yields faster to train and stronger evaluation functions.

The "komi" can be regarded (this is not the way it normally is regarded in China, but it is equivalent to it and is valid)

as a bank of "extra moves" in a version of go where the object of the game is not to "get the most territory" but rather

to "play the last move of the game" because you run your opponent out of moves. This is the description of the

go rules used in so called "mathematical go" which is a "Conway game."

The reason I am saying this is, to make the (imperfect) analogy to the chess 50 counter clear. Your goal in chess is

to get a win before that counter expires.

Veedrac

unread,

Dec 21, 2019, 12:40:02 PM12/21/19

to LCZero

On Friday, 20 December 2019 17:13:57 UTC, Warren D Smith wrote:

UPDATE, Lee Sedol lost game #2. The third and final game will be tomorrow, I assume again with
a handicap stone. And actually it seems they were using BOTH a handicap stone and a komi (unusually),
so it effectively was a smaller handicap than the usual 1 stone handicap
for Lee Sedol, but still it was enough for him to win game #1.

--I just had another idea which is a much closer analogy to the "fake komi" idea that
was highly successful in neural net go. Two papers on two versions of that idea:
I. https://arxiv.org/abs/1705.10701 "In the ML value network, different values (win rates) are trained simultaneously for different settings of komi."
II. https://arxiv.org/abs/1809.03928 "the winrate for all komi values is obtained, at the price of predicting just one more variable."

Here is the chess version of that. Make lc0's neural net output, not only win prob (conditioned on not draw),
draw prob, and possible-move probs, but ALSO same for different values of the "number of moves left until 50draw"
counter, anywhere from say 5 to 500 (that is like paper I).
Or, make network provide these outputs in the form of parameters describing a FUNCTION of that counter
(that is like paper II).

IMO the value head should be forced to predict values for all move counters, and only the policy head should have access to the true move counter. This would give a more informative network, be useful during search, allow prioritizing faster wins without introducing bias, force the network to be more general, and hopefully train slightly faster too.

Warren D Smith

unread,

Dec 21, 2019, 2:28:54 PM12/21/19

to Veedrac, LCZero

> IMO the value head should be forced to predict values for all move
> counters, and only the policy head should have access to the true move
> counter. This would give a more informative network, be useful during
> search, allow prioritizing faster wins without introducing bias, force the
> network to be more general, and hopefully train slightly faster too.

--exactly.

Warren D Smith

unread,

Dec 22, 2019, 1:09:27 PM12/22/19

to LCZero

Final result: Lee Sedol 1, HandDol 2,

two games were played with a minimal (1 stone using komi) handicap for Lee Sedol

(of which he won the first) and the middle game was played even.

What a human would consider as the reason why HanDol lost game 1, was not terribly hard to

see for a human. I'm incredibly weak compared to them but even I could see it, at least with a little

help from annotators. Basically, there was a "ladder" which did not directly "work" for Lee Sedol,

but then a "net" enabled it to effectively work anyhow. Somehow, HanDol apparently did not see it.

No idea what went wrong but it did not seem like a very deep and impressive conquest, more like

two humans playing and one just makes a pretty lame blunder that I would think any pro, or

even dan-level amateur, would VERY rarely commit, and then almost immediately resigns.

Hell, I bet even the go programs from the pre-neural-net era would not have fallen for this.

You can see a somewhat wacky video annotation of this game at

https://www.youtube.com/watch?v=8SU7_GI3dWA

and if you want to fast forward to the crucial blunder go to 19:00.

(Another possibility is HanDol realized somewhat earlier, via deep thinking,

that it was going to lose and therefore played badly because

when these monte carlo players see they are losing, they play badly?)

So that is kind of a problem with these neural net AIs: they are inscrutable.

When they are stupid, nobody knows why. It is a black art.

Stephen Timothy McHenry

unread,

Dec 25, 2019, 1:58:14 AM12/25/19

to LCZero

If that is what happened in the game the AI lost, it is called the "horizon" effect. The computer plays a move that loses badly becuase it is trying to confuse the issue by playing a move that in the time allotted cannot be thoroughly searched out. The moves it could search out gave bad results, so it plays the blunder! Computers are bad losers.

Warren D Smith

unread,

Dec 25, 2019, 2:58:03 AM12/25/19

to Stephen Timothy McHenry, LCZero

Actually, a computer could not play handicap go well at all if it was
idiotic about lost
positions (the initial position in a handicap go game is lost).
Plainly the alpha0-like computers capable of playing handicap go well,
depend on the variable-komi trick to
permit them to get over that obstacle.

The analogy in chess would be, e.g. playing with material odds. Could
Lc0 play well
in such a game?

Todd Freitag

unread,

Dec 25, 2019, 4:38:40 AM12/25/19

to LCZero

Older Leela nets (like T10) are tough at material odds games.

Recent nets are not, perhaps because they resign lost positions and don't get to train them.

Warren D Smith

unread,

Dec 25, 2019, 2:20:00 PM12/25/19

to Todd Freitag, LCZero

Another approach to Lc0 training to hopefully overcome the "draw death"
flaw would be to play training games with TIME ODDS. I.e. one side
(randomly chosen) has more time on its clock (or if you are using node
count not clock time,
then a greater node count allowance).

The advantage of this is that this will cause training games to have
fewer draws, allowing
more effective learning (100% draws ==> zero learning rate).
Optimally you would tune
the time-imbalance factor to get the most information, that is the
greatest "entropy"
of the prob-distribution of 3-valued game results, per unit train time.

It also may be possible to use time odds to hopefully diminish the
flaw of ineffective learning about how to win won games, and about how
to be tough when you are losing.
Namely, give the apparently-losing side more time on its clock. This
will mean that a
lost position in training will sometimes not be lost, at an
artificially high rate of loss-avoidance. This will train Lc0 better
about how to handle lost & won positions,
because it will learn from more information. (100% losses ==> zero learning.)

Warren D Smith

unread,

Dec 25, 2019, 2:35:30 PM12/25/19

to Todd Freitag, LCZero

Yet another idea to overcome the "draw death" and "plays weak in lost
and in won positions" flaws:

Do not train Lc0 using just one neural net. Use it to train two (or
perhaps even more)
nets simultaneously. For simplicity let me consider two nets: a
"big" net and a "small" one.
The big net will hopefully ultimately play stronger (at least if we
base things on node-count not clock time). But the opposite might be
true initially. At any point in time during training
we can have a current notion of which of the two nets is stronger and
by how much.

Now when in monte carlo tree search we are doing playouts, we now have
the option
of doing "asymmetrical playouts" where the "heavy" side gets to use
the stronger net more often. Perhaps which net is used (each use) is
chosen randomly by flipping biased coins.

Anyhow, my point is this. You can tune the use-frequencies of
the two nets to avoid draws, if draws are too frequent; to make the
side with the weaker
position effectively be a relatively stronger player, etc etc. Having
two nets available gives
you a lot of freedom to tune the learning to avoid the "draw death"
and "failure to
learn to be tough in lost positions" flaws.

I think this is a very promising idea.

Reply all

Reply to author

Forward