--
You received this message because you are subscribed to the Google Groups "Deep Q-Learning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deep-q-learni...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deep-q-learning/d4d42331-024f-4d70-9819-9c76b3a33b9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I know that OpenAI is also working on standardized evaluation environment for RL tasks. It should be published in coming days (or weeks). Maybe it will clarify the situation a bit.
I've also made a DQN implementation in TensorFlow. My scores are on par with simple_dqn/deep_q_rl - I regularly get in the 350's (average of 30 games) on breakout. I also have support for double q-learning, gradient clipping, different optimizers (including DeepMind's version of RMSProp), and am still adding new things.
As far as speed goes, I made a parallel version that interacts with the emulator and learns simultaneously. Adjusted for hardware, it runs a little faster than simple_dqn right now with default RMSProp (something in my implementation of DeepMind's RMSProp is not optimized, so it is slightly slower than simple_dqn). Tambet also mentioned somewhere that he measured speed at the very start of the training process, when the exploration rate is still high, while the speedup from parallelization is more notable after the exploration rate settles.
What scores did you get on seaquest? I'm training on that game right now actually.
Tambet, what kind of scores are you getting from Space Invaders? Do you know how deep_q_rl does? My scores are lower than DeepMind's for that game too. Why do you think DeepMind's code still does a little better than deep_q_rl?
I know that OpenAI is also working on standardized evaluation environment for RL tasks. It should be published in coming days (or weeks). Maybe it will clarify the situation a bit.
Looks great Jonathon! Very nice and compact implementation. Expressing networks in TensorFlow (or Theano) is a bit verbose for my taste, but clearly very flexible.
Yes, I wanted to measure training speed and prediction speed separately, and in the beginning when it's mostly random the training time dominates. But simple_dqn still has a lot of room for improvement, it achieves only ~30% utilization on Titan X. The time is mostly spent on moving data between RAM and GPU. And it doesn't help that Neon uses HWCN data layout (other than having convolutions really fast).
I haven't done much testing with simple_dqn beyond Pong and Breakout. Actually, I had some spare GPUs and set Seaquest and Space Invaders training right now. Also I haven't tested deep_q_rl myself, I just asked this list for numbers and Alejandro was kind enough to provide me his testing results. But he wasn't sure if these were achieved with the latest version. The numbers I'm referring to are here:
Regarding your other question about resource usage, then yes, Neon seems to use resources much more effectively. For example network snapshot in Neon takes 13MB, in Lua/Torcs 500MB.