Hi,
I
have been able to follow the tutorials and setup the environment.
I
copied the OffPACPuddleWorld into my own project, and was able to run it, export it as a
runable jar file and successfully opened it in Zephyr.
For my current problem, I want to create two almost independent agents in parallel, each one with its own environment. I just want them to communicate at the end of each step and exchange their parameters.
I am only concerned about predictions so I do not need any actor, but only the two critics learning off-policy and exchanging their estimated parameters "w_{k,i}" (where "k" denotes the number of the agent and "i" the time-step).
First of all, there are two Runners in the example (learning and evaluation). I understand that one must take actions following the behavior policy, and estimate the prediction-parameters using GTD; while the other computes the optimal policy for such estimated parameters. Is that right?
How does this exactly map to the code (which variables do what)?
When I inspect the variables in Zephir, both the critic and the actor are inside the Runners, but in the declaration they seem to be variables different from "criticAdapter". Please, could you explain briefly how these variables interact?
For my problem, should I duplicate both runners for each agent and make them exchange the runner.critic.offPolicyTD.w vector?
Any help will be very much appreciated!
Thanks!!!
Sergio
--
---
You received this message because you are subscribed to the Google Groups "RLPark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to githubrlpark...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.