Forward and backward view of elegibility traces

131 views
Skip to first unread message

Davi Carnaúba

unread,
Jan 13, 2016, 10:58:23 PM1/13/16
to rl-...@googlegroups.com
With the assumption that a state is never revisited, can I say that the Forward and Backward view are equal even in the on-line case?

Best regards,

Davi

Richard Sutton

unread,
Jan 13, 2016, 11:47:33 PM1/13/16
to rl-...@googlegroups.com
Davi, 

yes, i think so.  i am thinking that if a state is never revisited (in the tabular case without function approximation) then the on-line and off-line cases are the same by the end of an episode.  thus the forward and backward views will be equivalent in this case just as they are in the off-line case.

but note that now we have true online TD(lambda), which gives us the desired exact equivalence in the general linear case.

-rich


--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en
---
You received this message because you are subscribed to the Google Groups "Reinforcement Learning Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rl-list+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Davi Vieira

unread,
Jan 14, 2016, 10:13:11 AM1/14/16
to Reinforcement Learning Mailing List, rsu...@ualberta.ca


Thanks!


Best,

Davi

Ashique Rupam Mahmood

unread,
Jan 16, 2016, 7:38:25 AM1/16/16
to rl-...@googlegroups.com

Hi Davi,

I believe you are referring to tabular TD(lambda). In that case, the claim is correct, as Rich mentioned. However, when we go beyond the tabular setting, for example in the linear function approximation setting, the online forward view and the backward view of TD(lambda) are not equivalent even in the case of no revisit. The algorithm that achieves this equivalence is called true online TD(lambda). Its equivalence holds whether a state is revisited or not.

Cheers,
Rupam



Reply all
Reply to author
Forward
0 new messages