offline matched RL?

knowledgem...@gmail.com

unread,

Oct 13, 2016, 7:46:15 AM10/13/16

to BURLAP Discussion

Hi,

Does Burlap support offline matched RL learning?

LSPI supports to learn from saved episodes.

How about other learning algorithms, td(lambda) ?

When I read it, other learning algorithms interact with env to get state, execute action and get rewards and next state.

Do they support offline training?

How to modify them to get offline training and use vfa as well?

Thanks in advance!

knowledgem...@gmail.com

unread,

Oct 13, 2016, 7:47:41 AM10/13/16

to BURLAP Discussion

Sorry, I mean batched RL, not matched:=)

James MacGlashan

unread,

Nov 8, 2016, 1:18:40 PM11/8/16

to BURLAP Discussion

Hi,

While there is not a single method that pre-defined that will allow you to pass a list of episodes, you can update the TD algorithm by iterating through each step in the episode and calling the critiqueAndUpdate method with a constructed EnvironmentOutcome for that transition in the episode and by calling "endEpisode" on it at the end of each episode you feed it.

James

Vishma Dias

unread,

May 31, 2017, 4:07:45 AM5/31/17

to BURLAP Discussion

Hi,

Can you do Qlearning offline using batch processing? Since the runLearningEpisode on Qlearning class uses the best action to take based on the q values, is there any way to run a episode of my own to learn the agent?

Best Regards,
Vishma.

Reply all

Reply to author

Forward