Implementing policy gradient using Keras (cross_entropy trick)

35 views
Skip to first unread message

Onurcan Bektaş

unread,
Jun 17, 2020, 10:28:02 AM6/17/20
to Machine Learning for Physicists
Hi everyone,

I am having trouble understanding how to implement policy gradient using cross_entropy trick.

In the slides, it is mentioned that  

Screenshot 2020-06-17 at 15.22.52.png

but, it is still not clear to me what to give as an input to the train function.

For example, let's say we are using policy gradient method to train the simple walker example (the first example on RL, where the reward is measured as the distance from the origin at time T). If we have a neural network for the policy with a single input (the position) and two outputs (up or down actions), what should be the "observered_inputs" variable for the train function?

Reply all
Reply to author
Forward
0 new messages