Implementing policy gradient using Keras (cross_entropy trick)

35 views

cross_entropykeraspolicy

Skip to first unread message

Onurcan Bektaş

unread,

Jun 17, 2020, 10:28:02 AM6/17/20

to Machine Learning for Physicists

Hi everyone,

I am having trouble understanding how to implement policy gradient using cross_entropy trick.

In the slides, it is mentioned that

Screenshot 2020-06-17 at 15.22.52.png

but, it is still not clear to me what to give as an input to the train function.

For example, let's say we are using policy gradient method to train the simple walker example (the first example on RL, where the reward is measured as the distance from the origin at time T). If we have a neural network for the policy with a single input (the position) and two outputs (up or down actions), what should be the "observered_inputs" variable for the train function?

Reply all

Reply to author

Forward

0 new messages