Query: Reinforcement Learning Resources

Prabhav Kaula

unread,

Sep 15, 2022, 11:04:45 AM9/15/22

to Reinforcement Learning Mailing List

Respected sir

I am Prabhav. I hope you are doing well. I am writing this message as I am aspiring to step into deep learning based reinforcement learning. I am currently working on fairness in deep generative models.

Before starting my journey I got my initial exposure to AI from Ben Goertzel's talks and workshops. I was then motivated to look into artificial general intelligence. My initial references were Pei Wang's syllabus, the AGI google group and Marcus Hutter's website. I started with the deep learning by computer vision projects. I got motivated to learn the robotics and RL side of the things when I wanted to reproduce ideas behind generalist agents, protein folding and multi-modal learning. I revisited Marcus' shared resources which directed me to the book by R. Sutton and A. Barto

I humbly request you to guide me in making an authentic learning map for reinforcement learning.

Kind regards

Prabhav

PS There are too many resources (MOOCS, books and tutorials) available which shadow the genuine ones.

The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.

Warren Powell

unread,

Sep 15, 2022, 3:25:50 PM9/15/22

to rl-...@googlegroups.com

Prabhav,

"reinforcement learning" is a subset of what I have been calling "sequential decision analytics". For example, every RL problem is a sequential decision problem, and every RL method falls in one of the four classes of policies. See

https://tinyurl.com/sdafieldyoutube

for a video introduction, and the webpage

https://tinyurl.com/sdafield

I have prepared a resources page with videos, books, suggested courses for teaching this material, and various webpages with insights into the vast field of sequential decision problems.

For my description of "reinforcement learning" please see the page

https://tinyurl.com/what-is-rl/

Enjoy!

Warren

------------------------------
Warren B. Powell

Chief Analytics Officer, Optimal Dynamics

Professor Emeritus, Princeton University

http://www.castlelab.princeton.edu

--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en
---
You received this message because you are subscribed to the Google Groups "Reinforcement Learning Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rl-list+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rl-list/a9c8ca75-b840-4b0d-a30e-b06040381a81n%40googlegroups.com.

Constantinos

unread,

Sep 15, 2022, 8:51:50 PM9/15/22

to Reinforcement Learning Mailing List

One of the methods for learning about a subject that works for me is a combination of hands-on tutorials and reading books/papers. I suggest you start with David Silver's Lectures in RL accompanied by the Sutton and Barto book. There is code out there solving the exercises in the book so that might be a good start for you.

You could easily read in a few hours the introductory material in Spinning Up in Deep RL by OpenAI - don't bother at the beginning with detail description of Policy Gradient methods. This will give you a sense of what is out there and various taxonomies of the methods. You will re-read this one multiple times as you gain more experience from the lectures and the book. This material will help you navigate among tons of variations in many typical RL algorithms, and you can use code to test some simple algorithms.

Depending on what you like, your studies could be breadth first (learning about what kind of methods exist and what problems are trying to solve) and then focusing more on one depending on your needs e.g. Policy Gradient Methods. Or you could start following the lectures and learning all different variations in Value methods before moving to Policy Gradient Methods.

RL is a whole research field and tons of material and topics are discussed only in papers and not in any textbook. My suggestion is to build a good foundation from the lectures and the book without necessarily covering every single topic (again the Spinning Up material could be your "roadmap"), learn the computations involved in the "vanilla" version of the algorithms and then move on with function approximators variations with Neural Networks (don't rush to get there!). This way, when you start using NNs you will have a sense why you need them, what the input/output should be, what they should approximate, loss functions etc.

Best,

Konstantinos Mitsopoulos
Project Scientist, Robotics Institute
Carnegie Mellon University

Olivier Sigaud

unread,

Sep 16, 2022, 11:06:58 AM9/16/22

to rl-...@googlegroups.com

Hi Prabhav,

I completely agree with Konstantinos Mitsopoulos' answer.

In addition, among the available MOOCs, I believe Martha and Adam White's coursera RL class has a very good reputation.

To play with RL algorithms using them as blackboxes, probably the most used library is Stable baselines3:

https://github.com/DLR-RM/stable-baselines3

As the name implies, it is very useful if you want to compare your algorithm to the literature without having to tune the hyper-parameters, but it may not be a good choice if you want to learn RL by coding the algorithms.

Besides, if I may self-advertise, I have a youtube channel on RL from which I get good feedback from beginners (or maybe just from too polite people :)).

A first playlist is about the basic concepts in the tabular case, close to the content of the 1998 version of Sutton and Barto's book, but it continues to DQN, DDPG and a quick look at TD3:

https://www.youtube.com/playlist?list=PLe5mY-Da-ksWV330WbfazLUyOuR59sers

A second playlist is more about the policy search and policy gradient view, covering policy gradient methods, A2C, TRPO, ACKTR, PPO, DDPG and TD3 again, SAC and TQC:

https://www.youtube.com/playlist?list=PLe5mY-Da-ksVCsqA9Szo8wX5cH76-t89u

And recently I have made available the pytorch-based library that I'm using for teaching how to code deep RL, designed specifically for educational purpose, named BBRL:

https://github.com/osigaud/bbrl

In the README you will find a list of notebooks to gradually learn how to code the above list of algorithms. Being recent, it certainly needs improvement, but I hope it may help.

Don't hesitate to send feedback.

Best regards,

Olivier Sigaud

To view this discussion on the web visit https://groups.google.com/d/msgid/rl-list/761b9520-0a60-40d5-b215-9dd48bdabe48n%40googlegroups.com.

Warren Powell

unread,

Sep 16, 2022, 11:35:45 AM9/16/22

to rl-...@googlegroups.com

An additional note on "policy search" ....

The RL community seems to focus on the "policy gradient method". There are two dimensions to "policy search":

1. Choosing the class of policy - Go to https://tinyurl.com/RLandSO/ and download chapter 11 for my overview of all four classes of policies, and how to choose among them.

2. Most policies have tunable parameters (increasingly called "hyperparameters"). In my chapter 12 (on the first class of policies called "Policy function approximations") I describe different approaches to parameter search:

o Derivative-based vs. derivative-free tuning

o Online vs. offline learning

o Performance-based vs. supervised learning

I then go into detail describing:

o Derivative-based stochastic search using numerical derivatives (see my chapter 5)

o Derivative-free stochastic search (see chapter 7)

o Exact derivatives for continuous states and actions (think of stochastic control problems)

o The policy gradient method (covered in detail, but this is a rare section marked with **).

Warren

------------------------------
Warren B. Powell

Chief Analytics Officer, Optimal Dynamics

Professor Emeritus, Princeton University

http://www.castlelab.princeton.edu

Reply all

Reply to author

Forward