question about "Markov games as a framework for multi-agent reinforcement learning"

25 views
Skip to first unread message

B. Phoebe Wang

unread,
Nov 3, 2009, 11:55:22 AM11/3/09
to Reinforcement Learning Mailing List
Hello everyone!

I am working in the area of wireless networking, and recently very
interested in applying MARL to networking. When I read the paper
"Markov games as a framework for multi-agent reinforcement learning",
I found that using real value V(s') to approximate the expected value
is a great idea to solve the optimal policy of MDP and Markov games,
since it doesn't require knowing the transition rule. However, I have
a question, that is, whether there exists any theoretical proof about
the convergence of the algorithm given in Figure. 1 (minimax-Q
algorithm). I didn't find the proof in the paper, and nothing is found
after I searched the web.

So someone here can give me a hint? Many thanks!

Best regards,
Phoebe

Scott Livingston

unread,
Nov 3, 2009, 2:44:01 PM11/3/09
to rl-...@googlegroups.com
Hi Phoebe,

Could you provide a full reference, or a copy of the PDF for this paper?

Thanks,
Scott Livingston


2009/11/3 B. Phoebe Wang <beibei...@gmail.com>:
--
slivi...@caltech.edu
http://scottman.net
865-964-5384

B. Phoebe Wang

unread,
Nov 3, 2009, 3:26:20 PM11/3/09
to Reinforcement Learning Mailing List
Hi Scott,

Here is the PDF for this paper.
http://www.eecs.harvard.edu/~parkes/cs286r/spring06/papers/littman94markov.pdf.

Thank you!

Phoebe

On Nov 3, 2:44 pm, Scott Livingston <scottclivings...@gmail.com>
wrote:
> Hi Phoebe,
>
> Could you provide a full reference, or a copy of the PDF for this paper?
>
> Thanks,
> Scott Livingston
>
> 2009/11/3 B. Phoebe Wang <beibei.bbw...@gmail.com>:
>
>
>
>
>
> > Hello everyone!
>
> > I am working in the area of wireless networking, and recently very
> > interested in applying MARL to networking. When I read the paper
> > "Markov games as a framework for multi-agent reinforcement learning",
> > I found that using real value V(s') to approximate the expected value
> > is a great idea to solve the optimal policy of MDP and Markov games,
> > since it doesn't require knowing the transition rule. However, I have
> > a question, that is, whether there exists any theoretical proof about
> > the convergence of the algorithm given in Figure. 1 (minimax-Q
> > algorithm). I didn't find the proof in the paper, and nothing is found
> > after I searched the web.
>
> > So someone here can give me a hint? Many thanks!
>
> > Best regards,
> > Phoebe
>
> --
> slivings...@caltech.eduhttp://scottman.net
> 865-964-5384

B. Phoebe Wang

unread,
Nov 3, 2009, 5:47:35 PM11/3/09
to Reinforcement Learning Mailing List
I thought this paper may have the answer to my original post.
http://www.lirmm.fr/~jq/Cours/3cycle/module/HuWellman98icml.pdf.

Thanks.

On Nov 3, 2:44 pm, Scott Livingston <scottclivings...@gmail.com>
wrote:
> Hi Phoebe,
>
> Could you provide a full reference, or a copy of the PDF for this paper?
>
> Thanks,
> Scott Livingston
>
> 2009/11/3 B. Phoebe Wang <beibei.bbw...@gmail.com>:
>
>
>
>
>
> > Hello everyone!
>
> > I am working in the area of wireless networking, and recently very
> > interested in applying MARL to networking. When I read the paper
> > "Markov games as a framework for multi-agent reinforcement learning",
> > I found that using real value V(s') to approximate the expected value
> > is a great idea to solve the optimal policy of MDP and Markov games,
> > since it doesn't require knowing the transition rule. However, I have
> > a question, that is, whether there exists any theoretical proof about
> > the convergence of the algorithm given in Figure. 1 (minimax-Q
> > algorithm). I didn't find the proof in the paper, and nothing is found
> > after I searched the web.
>
> > So someone here can give me a hint? Many thanks!
>
> > Best regards,
> > Phoebe
>
> --
> slivings...@caltech.eduhttp://scottman.net
> 865-964-5384

Vassilis Vassiliades

unread,
Nov 3, 2009, 6:35:29 PM11/3/09
to rl-...@googlegroups.com
Hi Phoebe,

If I remember correctly minimax-Q was a good algorithm for zero-sum games, whereas Nash-Q was the extension to general-sum games. I don't know if this helps but for a comprehensive survey on MARL I recommend reading this paper (if you haven't read it already):

Busoniu Lucian, Babuska Robert, and De Schutter Bart, A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C 38(2): 156-172 (2008)

Best wishes,
Vassilis Vassiliades

Francisco Martinez Gil

unread,
Nov 4, 2009, 5:00:24 AM11/4/09
to rl-...@googlegroups.com
Hi Phoebe,
recently other interesting paper about Nash equilibria without the very
restrictive conditions of the other algorithms for the general case has
appeared:

Akchurina, Natalia. Multiagent Reinforcement Learning: Algorithm converging to
Nash equilibrium in general-sum discounted stochastic games. AAMAS09

Regards.
Francisco Martinez

El Miércoles, 4 de Noviembre de 2009 00:35, Vassilis Vassiliades escribió:
> Hi Phoebe,
>
> If I remember correctly minimax-Q was a good algorithm for zero-sum games,
> whereas Nash-Q was the extension to general-sum games. I don't know if this
> helps but for a comprehensive survey on MARL I recommend reading this paper
> (if you haven't read it already):
>
> Busoniu Lucian, Babuska Robert, and De Schutter Bart, A Comprehensive
> Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems,
> Man, and Cybernetics, Part C 38(2): 156-172 (2008)
>
> Best wishes,
> Vassilis Vassiliades
>
> On Wed, Nov 4, 2009 at 12:47 AM, B. Phoebe Wang
<beibei...@gmail.com>wrote:
> > I thought this paper may have the answer to my original post.
> > http://www.lirmm.fr/~jq/Cours/3cycle/module/HuWellman98icml.pdf<http://ww
> >w.lirmm.fr/%7Ejq/Cours/3cycle/module/HuWellman98icml.pdf> .
Message has been deleted

B. Phoebe Wang

unread,
Nov 4, 2009, 12:13:23 PM11/4/09
to Reinforcement Learning Mailing List
Hi,

Thank you guys so much for recommending good papers to me. I will
definitely read them. :)

Best regards,
Phoebe

Liam Mac Dermed

unread,
Nov 5, 2009, 1:24:49 PM11/5/09
to Reinforcement Learning Mailing List
The paper Francisco Martinez mentioned is an algorithm to find any
Nash equilibrium. Typically there are lots of different equilibria,
many quite bad. This is why the multi-agent learning community has
heavily frowned upon research that finds equilibria without
justification or any other guarantees. (See: "If multi-agent learning
is the answer, what is the question?" by Yoav Shoham, Rob Powers, and
Trond Grenager). Also, given that, in general, finding Nash Equlibria
is NP-complete I'm suspicious about the results of that paper. So the
question you should ask yourself is what is your criterion for
success. Also there is a lot of work on game theory in networks and
routing under the field of "computational game theory".

If your domain allows for communication between agents and you want to
assume that other agents are fully rational (in the game theoretic
sense), I would suggest my forthcoming paper titled "Solving
Stochastic Games" (stochastic games are just another name for Markov
games) to be published at NIPS '09, which finds the entire set of
correlated equilibria in stochastic games and given a particular
bargaining solution (the method by which a particular equilibrium is
chosen out of a whole set of possibilities) will give you the optimal
solution. I'm also interested is possible domains to apply the
algorithm so if you think your problem is well suited to being framed
as a stochastic game let me know.

- Liam

Michael Littman

unread,
Nov 5, 2009, 3:47:27 PM11/5/09
to rl-...@googlegroups.com
Hi,

  The Hu and Wellman paper doesn't have a convergence proof.  However, if you are asking about minimax-Q, see: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.
21.1260&rep=rep1&type=pdf .

-Michael
Message has been deleted

B. Phoebe Wang

unread,
Nov 18, 2009, 11:01:50 AM11/18/09
to Reinforcement Learning Mailing List
Hi,

I have another question about implementing the minimax-Q learning
algorithm. In Figure 1 (the minimax-Q learning) in [Littman94], what
is stated explicitly about choosing an action is only for the
maximizer/agent, who chooses his/her action according to pi[s,a].
However, during the learning phase, both players should choose an
action, so how should the opponent choose his/her action o? Should the
opponent also maintain a record about his/her strategy like the pi
[s,.] as the agent does? If so, how is the opponent strategy updated,
the minimizer pure strategy o in getting the pi[s,.]? If this is the
case, would "second guessed" happen?

Another question is: if the problem formulation requires that the
state is a high-dimensional vector, and so it is for the action space,
and both the state and action space are very large, how to judge
whether the Q-learning converges to the true value? Is it true that we
need to avoid such high-dimension stuff to make things work?

Thank you in advance for your kind answer!

Best,
Phoebe

Michael Littman

unread,
Nov 18, 2009, 8:23:05 PM11/18/09
to rl-...@googlegroups.com
The opponent should decide which action the opponent should take.  :-)  The convergence theorem just says that if every state / joint action pair is chosen infinitely often, the values go to the minimax solution in the limit.

Brafman and Tenenholtz's Rmax paper does some analysis of what happens if the opponent does not adequately explore.  Briefly, if the opponent doesn't choose actions correctly, it can only help us (reward will be no smaller than at the equilibrium).


--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en

Reply all
Reply to author
Forward
0 new messages