RL survey

Csaba Szepesvari

unread,

Oct 26, 2009, 12:55:43 AM10/26/09

to rl-...@googlegroups.com

Hi,

I have put together a survey on "Reinforcement learning in MDPs" and
have put it on the web as a technical report, which, I hope, many of you
would enjoy to read. The survey is of course not comprehensive as I
tried to keep it concise (it is already 65 pages long, though the last
13 pages are references). In any case, if you like or hate it, or you
think something important is omitted, or you have found some typo,
mistake, or a suspected misrepresentation of the facts, drop me a few
lines. I appreciate any feedback! Details of where to download, how to
cite, etc. etc. are to be found below. Oh, the assumed level of prior
knowledge: I assumed that the reader knows about MDPs and dynamic
programming; so MDPs are reviewed in about 5 pages. People not familiar
with MDPs are suggested to read Chapter 2 in the book by Sheldon Ross
(to which a link is given at the end of the e-mail).

Other than this, I tried to make the text easy to follow, as much as I
could do it.

Bests,
Csaba

Reinforcement Learning Algorithms for MDPs -- A Survey
Cs. Szepesvári
Technical Report TR09-13,
Department of Computing Science, University of Alberta, 2009.
http://www.cs.ualberta.ca/TechReports/2009/TR09-13/TR09-13.pdf,

Keywords:reinforcement learning; Markov Decision Processes; temporal
difference learning; stochastic approximation; two-timescale stochastic
approximation; Monte-Carlo methods; simulation optimization; function
approximation; stochastic gradient methods; least-squares methods;
overfitting; bias-variance tradeoff; online learning; active learning;
planning; simulation; PAC-learning; Q-learning; actor-critic methods;
policy gradient; natural gradient

Abstract:This article presents a survey of reinforcement learning
algorithms for Markov Decision Processes (MDP). In the first half of the
article, the problem of value estimation is considered. Here we start by
describing the idea of bootstrapping and temporal difference learning.
Next, we compare incremental and batch algorithmic variants and discuss
the impact of the choice of the function approximation method on the
success of learning. In the second half, we describe methods that target
the problem of learning to control an MDP. Here online and active
learning are discussed first, followed by a description of direct and
actor-critic methods.

Chapter 2 in Ross' book:
http://www.cs.ualberta.ca/~szepesva/RossChapter2.pdf

Davi Carnaúba

unread,

Oct 26, 2009, 8:50:52 AM10/26/09

to rl-...@googlegroups.com

i can't access the link.

2009/10/26 Csaba Szepesvari <szep...@cs.ualberta.ca>

Richard Cubek

unread,

Oct 26, 2009, 10:01:33 AM10/26/09

to rl-...@googlegroups.com

Davi Carnaúba schrieb:

> i can't access the link.

Hmm, i can access both...must be a problem in your own network/computer...

--
Richard Cubek, Dipl.-Ing.(FH)
University of Applied Sciences Ravensburg-Weingarten
Intelligent Mobile Robotics Laboratory
Phone: (0049) (0)751 501 9838
Mobile: (0049) (0)163 88 39 529

Marc Deisenroth

unread,

Oct 26, 2009, 9:13:29 AM10/26/09

to rl-...@googlegroups.com

Try to enable Javascript.

Antos Andras

unread,

Oct 26, 2009, 10:20:19 AM10/26/09

to rl-...@googlegroups.com

On Mon, 26 Oct 2009, Richard Cubek wrote:
> Davi Carnaúba schrieb:
>> i can't access the link.
> Hmm, i can access both...must be a problem in your own network/computer...

Me too, I could also wget it.
Andras

Michael Littman

unread,

Oct 26, 2009, 6:01:34 PM10/26/09

to rl-...@googlegroups.com

impressive survey!

I suspect the link problem is the trailing comma---some browsers need you to a blank space after the URL

2009/10/26 Antos Andras <an...@szit.bme.hu>

Reply all

Reply to author

Forward