12年前,IBM公司研制的超级计算机"深蓝"(Deep Blue)在6局比赛中击败了国际象棋世界冠军加里·卡斯帕罗夫(Garry
Kasparov)。这个里程碑式的事件终结了人类在又一个智力策略游戏上的统治地位。只有亚洲的围棋仿佛是计算机科学的"阿喀琉斯之踵":人类总是能
够轻松击败计算机。但最近出现的一种新的围棋算法,却能战胜高水平的棋手。
围棋的复杂度高,且极具欺骗性,对计算机程序提出了巨大的挑战。围棋的棋盘由两组数量相同、互相正交的平行线构成,分为9线小棋盘与19线大棋盘两种。
对弈双方分执黑白两色棋子。通过在棋盘的交叉点上落子,棋手要尽可能扩张自己的领地并包围对方棋子。在大棋盘的对弈中,每一步可采取的策略数量都非常巨
大。对局中期,平均每一步能采取200种不同的策略,相比而言,国际象棋中每一步数十种的可选策略就显得微不足道了。此外,还要考虑数量众多的后续策
略。由于棋盘上每个位置都对应着三种状态:黑子占据、白子占据和空位,N个位置便可构成3N种不同的状态。如果考虑规则约束,小棋盘大约有1038种不
同的状态,大棋盘的状态数量则达到了惊人的10170种。其他一些因素也会影响比赛胜负:棋盘上棋子的数量优势并不能确保胜利,棋手必须在考虑局部形式
的同时,兼顾全局。
为了处理如此众多的可能情况,人工智能专家已经设计出一些算法,来限制搜索的范围,但它们都无法在大棋盘的比赛中战胜实力稍强的人类棋手。去
年秋季,两位匈牙利研究人员报告了一种新算法,它的胜率比现有最佳算法提高了5%,能够在小棋盘的比赛中与人类职业棋手抗衡。这种被称为
UCT(Upper Confidence bounds applied to Trees)的算法,是匈牙利国家科学院计算机与自动化研究所(位于
布达佩斯)的列文特·科奇什(Levente Kocsis)与加拿大阿尔伯塔大学(University of Alberta,位于埃德蒙顿)的乔
鲍·塞派什瓦里(Csaba Szepesvári)合作提出的,是著名的蒙特卡罗方法(Monte Carlo method)的扩展应用。
20世纪70年代,蒙特卡罗方法首次运用于围棋程序,这种方法的作用类似于选举投票:用统计采样的方式,预测大规模群体的表现与特质。围棋程序会随机出
招,模拟大量的比赛,对候选走法加以评估并排序。然而,每一步都采用评估值最高的走法,并不能保证获得比赛的胜利。相反,这种方法仅仅是限制了搜索的范
围。
UCT扩展了蒙特卡罗方法,集中关注那些最有希望赢得比赛的走法。科奇什说:"UCT的主要思想是对走法进行选择性采样。"他解释说,这种
算法必须达到一种平衡,既要尝试当前的最佳走法,发现其中可能隐藏的昏招,还要搜索"当前并非最佳的走法,以确保不会因为先前的估计错误而错失妙
招"。
UCT为每一种走法计算一个索引值,然后按照索引值最高的走法出招。索引值由采用该走法后最终取胜的概率(即胜率)决定,该走法被计算却未
被采用的次数也对索引值有一定的影响。UCT会在内存中维护一棵"决策树",用来记录每一种走法的统计数据。如果遇到一种先前从未计算过的走法,算法就
会将它纳入决策树,并随机出招完成余下的比赛。
判定随机比赛的胜负后,UCT就会更新比赛中采用过的所有走法的统计数据。如果该走法的索引值等于它的胜率,算法就能快速选定这招最有希望
获胜的策略。但开局时走出妙招,仍然无法确保比赛的最终胜利。所以在选择走法时,UCT会增大计算次数少的候选走法的权值,以使胜率的总体分布趋向平
坦。
法国南巴黎大学的数学家西尔万·热利(Sylvain Gelly)与巴黎技术学校的王毅早(Yizao Wang,音译)将UCT集成到一
个他们称之为MoGo的程序中。该程序的胜率竟然比先前最先进的蒙特卡罗扩展算法几乎高出了一倍。今年春季,MoGo在小棋盘的比赛中击败了实力强劲的
业余棋手,在大棋盘比赛中也击败了实力稍弱的业余棋手,充分展示了能力。热利认为UCT易于实现,并有进一步完善的空间。科奇什预言,10年以后,计算
机就能攻克最后的壁垒,终结人类职业棋手对围棋的统治。
(译/陈家乾 校/虞骏)
附录:蒙特卡罗算法
以概率和统计理论方法为基础的一种计算方法。将所求解的问题同一定的概率模型相联系,用电子计算机实现统计模拟或抽样,以获得问题的近似解。为象征性地
表明这一方法的概率统计特征,故借用赌城蒙特卡罗命名。又称统计模拟法、随机抽样技术。由S.M.乌拉姆和J.冯·诺伊曼在20世纪40年代为研制核武
器而首先提出 。它的基本思想是,为了求解数学、物理、工程技术以及管理等方面的问题 ,首先建立一个概率模型或随机过程,使它们的参数,如概率分布或
数学期望等问题的解;然后通过对模型或过程的观察或抽样试验来计算所求参数的统计特征,并用算术平均值作为所求解的近似值。对于随机性问题,有时还可以
根据实际物理背景的概率法则,用电子计算机直接进行抽样试验,从而求得问题的解答。
蒙特卡罗方法有很强的适应性,问题的几何形状的复杂性对它的影响不大。该方法的收敛性是指概率意义下的收敛,因此问题维数的增加不会影响它的收敛速
度,而且存贮单元也很省,这些是用该方法处理大型复杂问题时的优势。因此,随着电子计算机的发展和科学技术问题的日趋复杂,蒙特卡罗方法的应用也越来越
广泛。它不仅较好地解决了多重积分计算、微分方程求解、积分方程求解、特征值计算和非线性方程组求解等高难度和复杂的数学计算问题,而且在统计物理、核
物理、真空技术、系统科学 、信息科学 、公用事业、地质、医学,可靠性及计算机科学等广泛的领域都得到成功的应用。
快速的回顾了过去50年的计算机围棋研究,穆勒指出,老式的围棋程序如Goliath和手谈,编程时注重在围棋知识,而新的方法"monte
carlo"和UCT,注重于海量的搜索和随机的下法搜索,而不是确定性的算法.Gnugo和MoGo是使用这种新方法的主要程序."他们在7X7的棋
盘是完美的,在9X9的棋盘上相当于业余3段",穆勒说.恐怕没有一个棋手输棋会像穆勒在06年12月被Gnuno击败时这样高兴.
去年,郭娟职业五段和crazystone程序在7X7的棋盘上进行了一系列的比赛,程序总是取得胜利或者执白的时候和棋.今年,MoGo在9X9的棋
盘上对郭娟取得了9胜5负的好成绩."Monte Carlo"总是下出一些奇怪的下法,穆勒对此不无担忧."但是这种算法善于取胜" 这种程序每秒钟
进行100,000次仿真或者说1百万步."为什么这种算法如此有效"穆勒问. "还没有理论上的解释,尽管经验性的结果是如此之好" 换句话说,"我
们还不知道".
尽管穆勒说,很多研究者现在认为,职业水平的围棋程序出现只是一个时间问题.他觉得可能还有些遥远."我自己的看法是我们还需要一到两个的好点子,但我
还不知道这些点子从哪里得到
原文:
MUELLER ON COMPUTER GO'S "REVOLUTIONARY" ADVANCES: "The last 12 months
have been the most exciting ever in computer go," computer scientist
Martin Mueller told a packed lecture hall Sunday night at the European
Go Congress. Hundreds of go players stayed after five grueling rounds
in the EGC Weekend Tournament to hear Mueller - a professor at the
University of Alberta -- discuss the latest "revolutionary" advances
in computer go.
Quickly reviewing fifty years of computer go research, Mueller
explained that "old school" go-playing programs like Goliath and Hand
Talk focused on programming in go knowledge, whereas the new
approaches -- the Monte Carlo method and UCT - are search-intensive
and use random move searches instead of deterministic algorithms.
Programs using this approach - GnuGo and MoGo are leading examples -
"are almost perfect on 7x7 and are as strong as an amateur 3-dan on
9x9," said Mueller. Perhaps no go player has ever been as happy as
Mueller was to lose a game when GnuGo beat him in December 2006.
Last year, Guo Juan 5P played a series against CrazyStone on a 7x7
board in which the program always won or got a jigo when playing white
against the pro; this year MoGo scored 9 wins and 5 losses against Guo
Juan on a 9x9 board. "Monte Carlo programs play many strange move,"
conceded Mueller, "but they're very good at winning. All without a
single line of programming."
Such programs run as many as 100,000 simulations - or 1 million moves
per second -- for each move in a 9x9 game. "Why does it work so well?"
Mueller asked. "There's no theoretical explanation, although we have
excellent empirical results." In other words, a broadly grinning
Mueller said, "We don't really know." Although Mueller said that many
researchers now think it's "just a matter of time before there's a
professional-level go-playing program," he think it may be farther
off. "My own feeling is that we need one or two more good ideas, but
where they'll come from I don't know."
附录:郭鹃
姓名:郭鹃(guojuan)
性别:女
国籍:荷兰
段位:中国职业围棋五段
1990年9月移居荷兰,1994年5月入荷兰籍。主要比赛成绩:4次欧洲冠军;7次荷兰冠军;7次欧洲应氏杯冠军;4次欧洲富士通杯冠军。
补充:加拿大艾德蒙顿的艾尔伯特大学的马丁.穆勒在《电脑围棋新的进展》中提到的几点。
Recent Progress in Computer Go
Martin Muller
University of Alberta Edmonton, Canada
采用蒙特卡罗算法、UCT算法的围棋程序有:
" 1993年,Bernd Brugmann 写了一些仿真程序
" 200x年,Bouzy 及其学生重新写了一些仿真程序
" 2006年,Kocsis 和 Szepesvari 使用了 UCT 算法
" Sylvain Gelly 和 Yizao Wang 写的围棋程序 MoGo
" Remi Coulom 写的围棋程序 Crazy Stone
" Don Dailey 写的新的围棋服务器程序 CGOSs
上述程序的棋力如何呢?
" 在 7x7 棋盘上的表现近乎完美
" 在 9x9 棋盘上的水平是有业余段位
" 在 19x19 棋盘上的水平是业余5级,与顶级电脑程序相似
以下是原文:
Monte-Carlo Simulation and UCT for Go
" 1993 Bernd Brügmann - simulations for Go
" 200x Bouzy and students revive simulations
" 2006 Kocsis and Szepesvari - UCT algorithm
" Sylvain Gelly, Yizao Wang - MoGo
" Remi Coulom - Crazy Stone
" Don Dailey - CGOS server, new programs
How Strong?
" almost perfect on 7x7
" amateur Dan level on 9x9
" 5 kyu on 19x19? Similar to top classic program
A basic MC program would simply collect statistics for all children of
the root node, but MC evaluation of these moves will not converge to
the best move. It is therefore necessary to do a deeper search.
The UCT-method (which stands for Upper Confidence bounds applied to
Trees) is a very natural extension to MC-search, where for each played
game the first moves are selected by searching a tree which is grown
in memory, and as soon as a terminal node is found a new move/child is
added to the tree and the rest of the game is played randomly. The
evaluation of the finished random game is then used to update the
statistics of all moves in the tree that were part of that game.
The UCT-method solves the problem of selecting moves in the tree such
that important moves are searched more often than moves that appears
to be bad. One way of doing this would be to select the node with the
highest winrate 50% of the time and select a random move otherwise.
UCT also selects the best move most of the time but also explores
other moves in a more sophisticated way. It does this by adding a
number to the winrate for each candidate move that goes down every
time the node has been visited. But this number also goes up a little
every time the parent was visited and some other move is selected.
This means that the win rate + the number will grow for unexplored
moves so that at some point the sum is higher than for all moves that
have higher winrates. If the move works, then the winrate goes up and
it may soon be selected again. If the move failed to win then the
winrate goes down as well as the added number and the move may have to
wait for a long time before the added number has grown large enough. A
move can also be selected if all other moves are refuted so that the
winrates for all competitors goes down.
In summary: starting from the root UCT selects a path of moves through
the tree by computing a value for each candidate move based on the
winrate and how many times the move has been played as well as how
many times the position has been visited. If there are children to a
node that has not been visited yet then one of those moves are
selected at random. Since this algorithm does not assume any knowledge
about go it is natural to explore every move at least once.
The sum of the winrate and the number (UCTvalue) for each candidate
move n is computed with
UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/
(5*n.nodevisits))
and the candidate move n with highest UCTvalue is selected to be
played next.
With UCT one can strike a good balance between searching the best move
so far and exploring alternative moves. I think the innovation of UCT
(to me at least) is the logarithm of the number of visits to the
parent node for the UCT-Value. This value goes up quickly early but
then it becomes flat, but never stops rising. It goes to infinity
although very very slowly. All moves will be searched no matter how
bad winrates they have, so that UCT-search given close to infinite
time and memory will find any winning move no matter how misleading
the winrates are initially.
Implementation details
There are at least two ways to implement the tree:
Build a tree in memory Pro: Easy to program Con: You quickly run out
of memory using this method.
Transposition Table: Pro: Identical positions reached with different
move orders will not be searched twice Con: Depending on
implementation details one might have problems with the correctness of
the algorithm since one need to overwrite old data so it is perhaps
more complicated to get bug free code.
Pseudo code for UCT (Based on Valkyria-UCT 22 September 2006)
This pseudo code example ignores what happens close to the end of the
game. When Valkyria finds a terminal node (after two passes) it
assigns INFINITY to n.Visits (and 0 or INFINITY to n.wins depending if
it was a win or loss) to indicate that this node is solved. Higher up
in the tree one must also check if a child refuted the node or that
all children failed to refute it.
const
UCTK = 1;
// Larger values give uniform search
// Smaller values give very selective search
type
TMove = (...);
PNode = ^Node;
Node = record
Wins: integer;
Visits: integer;
Move: TMove;
BestNode: PNode;
Child: PNode;
Sibling: PNode;
end;
function UCTSelect(n: Node): Node;
begin
bestuct := 0;
result := nil;
next := n.Child;
while next <> nil do
begin
if next.Visits > 0 then
begin
winrate := next.Wins/next.Visits;
uct := UCTK*Sqrt(ln(n.Visits)/(5*next.Visits));
uctvalue := winrate + uct;
end
else
// Always play a random unexplored move first
uctvalue := 10000 + random(1000);
if uctvalue > bestuct then
begin
bestuct := uctvalue;
result := next;
end;
next := next.Sibling;
end;
end;
procedure PlaySimulation(n: Node);
begin
if n.Visits = 0 then
randomresult := clone.PlayRandomGame
else
begin
if n.Child = nil then
CreateChildren(n);
next := UCTSelect(n);
clone.MakeMove(next.Move);
PlaySimulation(next);
end;
Inc(n.Visits);
UpdateWin(n, randomresult);
if n.Child <> nil then
SetBest(n);
end;
function UCTSearch: TMove;
begin
root := GetNode;
nsimulations := 0;
while (nsimulations < MaxSimulations ) do
begin
clone.CopyState(position);
PlaySimulation(root);
Inc(nsimulations);
end;
result := root.BestMove;
end;
PlayRandomGame This method plays a random game and returns the color
of the winner
CreateChildren(n: Node); This procedure generates all legal successors
in the position and adds those moves as children to the node n.
UpdateWin(n: Node; randomresult: TGameResult); This procedure adds 1
to n.Wins if the randomresult indicates that this is correct.
CopyState(position: TGameState); In order to play a random game the
boardstate has to be copied
SetBest(n: Node) This procedure sets n.BestMove to the child with the
highest winrate.
Note that the variable randomresult is a global variable.