Littman's minimax-q learning

2,046 views
Skip to first unread message

Xavi Junaso

unread,
Apr 7, 2009, 7:38:53 AM4/7/09
to rl-...@googlegroups.com
Hi, I am very new in RL and I am a little bit confused about this minimax-q learning algorithm for a simple soccer game.

I was looking at the paper "Markov games as a framework for multi-agent reinforcement learning" by Michael L. Littman, I understood what Littman basically does. However, I could not understand the details of the minimax-q algorithm. I think that most of you already knew this algorithm and it can be good to hear some explanations about it. Furthermore, if there is an applet (or just a simple code) that simulates the process and shows how the variables change, maybe more useful to understand the algorithm better.

Xavi...

William Uther

unread,
Apr 7, 2009, 8:05:01 PM4/7/09
to rl-...@googlegroups.com
Hi Xavi,

Littman's algorithm is a combination of Q-learning and the linear programming
solution to a single matrix game. Understanding each of these
separately would be a
good start to understanding the combination.

e.g.: http://en.wikipedia.org/wiki/Zero-sum_(game_theory)#Solution
http://en.wikipedia.org/wiki/Q-learning

(although a proper textbook would do much better than those pages.)

Be well,

Will :-}

Michael Littman

unread,
Apr 8, 2009, 8:06:05 PM4/8/09
to rl-...@googlegroups.com
will and amy greenwald have implementations of it, I think...(mine is too many moves ago and unlikely to be found).

William Uther

unread,
Apr 9, 2009, 12:19:45 AM4/9/09
to rl-...@googlegroups.com
I probably do have a copy of my implementation somewhere, but it was
written for classic MacOS and would
be easier to re-write than port. The most important part of the
algorithm was the linear program solver, and
for that I used code from Numerical Recipes in C which implemented the
simplex algorithm (the linear programming
simplex algorithm, not the gradient descent simplex algorithm).

As I mentioned in my previous email - you want to understand the two
separate parts of the algorithm first. Once
you have those parts, putting them together is not too hard.

Oh, and if you're really interested in RL for game playing, I
recommend you look at Michael Bowling's work (WoLF).

Be well,

Will :-}

zjc

unread,
Apr 29, 2009, 7:32:25 AM4/29/09
to Reinforcement Learning Mailing List
I'm trying to use The minimax-Q algorithm to simulate the soccer game
mentioned in the paper.

what's I'm wondering is that is the V in Table2(the
rock,paper,scissors game) the same as the V in the
pseudocode . If so ,it's much easier to use matlab function linprog to
sovle the linear programming , and to find
pi[s,.] and V[s].




On Apr 9, 12:19 pm, William Uther <will.ut...@gmail.com> wrote:
> I probably do have a copy of my implementation somewhere, but it was
> written for classic MacOS and would
> be easier to re-write than port.  The most important part of the
> algorithm was the linear program solver, and
> for that I used code from Numerical Recipes in C which implemented the
> simplex algorithm (the linear programming
> simplex algorithm, not the gradient descent simplex algorithm).
>
> As I mentioned in my previous email - you want to understand the two
> separate parts of the algorithm first.  Once
> you have those parts, putting them together is not too hard.
>
> Oh, and if you're really interested in RL for game playing, I
> recommend you look at Michael Bowling's work (WoLF).
>
> Be well,
>
> Will      :-}
>
> On Thu, Apr 9, 2009 at 10:06 AM, MichaelLittman
>

Michael Littman

unread,
May 8, 2009, 8:50:23 PM5/8/09
to rl-...@googlegroups.com
On Wed, Apr 29, 2009 at 7:32 AM, zjc <zjc...@gmail.com> wrote:

I'm trying to use The minimax-Q algorithm to simulate the soccer game
mentioned in the paper.

what's I'm wondering is that is the V in Table2(the
rock,paper,scissors game)  the same as the V in the
pseudocode .

   It's the updated value function, yes.
 
If so ,it's much easier to use matlab function linprog to
sovle the linear programming , and to find
pi[s,.] and V[s].

Easier than what?  Linear programming is precisely the right way to solve zero-sum games (the two problems are actually equivalent).

-Michael

Reply all
Reply to author
Forward
0 new messages