C++ implementation compatible with RLPark => RLLib

62 views
Skip to first unread message

Saminda Abeyruwan

unread,
Sep 4, 2012, 4:50:57 PM9/4/12
to github...@googlegroups.com
Dear Community, 

I am Saminda Abeyruwan, a graduate student from University of Miami. I am interested in researching in  the fields of reinforcement learning, probabilistic inferences, and its application to Learnable Knowledge Representation to build autonomous agents. During my research I have used GQ, Greedy-GQ, and Off-PAC for prediction and control problems in the Robocup 3D simulation league and recently in Robocup SPL (RoboCanes), especially role assignment within formations. I wanted to use RLPark;  but I could not directly use RLPark because our systems are base on C/C++ implementation.

Therefore, with the inspiration of RLPark, I have written a library, (which I moved out of our robotic code to a general purpose applications), called RLLib (https://github.com/samindaa/RLLib) , which follows a similar structure to RLPark, but templated . My original goal is to provide a set of header files like the boost library that one can directly use. With a lot of optimization I believe RLLib is quite good for learning tasks which is implemented in C++. I have verified the implementation with Off-PAC paper results, and RLLib results are quite similar. Like RLPark; RLLib contains:

1. Off-policy prediction algorithms: GTD(lambda), and GQ(lambda),
2. Off-policy control algorithms:  Greedy-GQ(lambda), Softmax-GQ(lambda), and Off-PAC.
3. On-policy algorithms: TD(lambda), and SARSA(lambda),
4. Efficient dot product implementation for tile coding base feature representations (with culling traces).
5. Mountain Car, Mountain Car 3D, Swinging Pendulum, and Continuous grid world (Off-PAC paper) environments. 
6. Optimized for very fast duty cycles (e.g., with culling traces, Robocup 3D simulation: at most 2-4ms, and Nao v4 at most 7 ms in the cognition thread)  
7. Main algorithm usage is very much similar to RLPark, therefore, swift learning curve. 

This will provide practitioners to directly use RLPark or RLLib in their learning tasks within the same structure. I haven't implemented the demons architecture, nor the OpenCL/CUDA extensions. Q(lambda) is not implemented because of the convergence problems. I am in the process of setting up a web page to show the statistics similar to RLPark. The code is available  in https://github.com/samindaa/RLLib . 

Thank you!

Sam

Reply all
Reply to author
Forward
0 new messages