which DQN implementation currently works best for Atari games? and why DQN good at certain games but bad at some others?

MW

unread,

Apr 24, 2016, 8:28:18 PM4/24/16

to Deep Q-Learning

Hi,

I'm a newbie. I played with deep_q_rl and simple_dqn on the SEAQUEST game. And found deep_q_rl scored better (even trained less in time).

I just wonder which DQN implementation currently works best for Atari games?

Since all the implementations are based on the same paper, I also wonder why there are (sometime big) such difference?

Even for DeepMind, some games (Space Invaders) perform at super-human level, and some games (SEAQUEST) below human-level, (although I see this two games rather similar), is there any study to explain the reason why DQN good at certain games but bad at some others?

Thanks for any insight.

Tambet Matiisen

unread,

Apr 25, 2016, 9:09:34 AM4/25/16

to deep-q-...@googlegroups.com

I think original DeepMind implementation still works the best:
https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner

I have some preliminary numbers in my blog post, see the table in section "How Does It Compare To Others?":
http://neuro.cs.ut.ee/deep-reinforcement-learning-with-neon/#docs-internal-guid-8923c5db-b58b-5a41-e18f-ae2c059653d6

I have done thorough comparison of simple_dqn to DeepMind's code and I suspect the difference comes from different RMSProp implementation. deep_q_rl implements RMSProp the same way as DeepMind, but still lags a little bit behind. On the other hand in Neon you can very easily switch to Adam and I've had promising results with it. If you could compile comprehensive comparison of different DQN implementations, that would be appreciated.

I know that OpenAI is also working on standardized evaluation environment for RL tasks. It should be published in coming days (or weeks). Maybe it will clarify the situation a bit.

Regarding your other question about resource usage, then yes, Neon seems to use resources much more effectively. For example network snapshot in Neon takes 13MB, in Lua/Torcs 500MB.

Tambet

--
You received this message because you are subscribed to the Google Groups "Deep Q-Learning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deep-q-learni...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deep-q-learning/d4d42331-024f-4d70-9819-9c76b3a33b9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathon Byrd

unread,

Apr 26, 2016, 10:46:40 PM4/26/16

to Deep Q-Learning

I've also made a DQN implementation in TensorFlow. My scores are on par with simple_dqn/deep_q_rl - I regularly get in the 350's (average of 30 games) on breakout. I also have support for double q-learning, gradient clipping, different optimizers (including DeepMind's version of RMSProp), and am still adding new things.

As far as speed goes, I made a parallel version that interacts with the emulator and learns simultaneously. Adjusted for hardware, it runs a little faster than simple_dqn right now with default RMSProp (something in my implementation of DeepMind's RMSProp is not optimized, so it is slightly slower than simple_dqn). Tambet also mentioned somewhere that he measured speed at the very start of the training process, when the exploration rate is still high, while the speedup from parallelization is more notable after the exploration rate settles.

What scores did you get on seaquest? I'm training on that game right now actually.

Tambet, what kind of scores are you getting from Space Invaders? Do you know how deep_q_rl does? My scores are lower than DeepMind's for that game too. Why do you think DeepMind's code still does a little better than deep_q_rl?

I know that OpenAI is also working on standardized evaluation environment for RL tasks. It should be published in coming days (or weeks). Maybe it will clarify the situation a bit.

There is this. Schulman is in OpenAI, but it seems to have come from Berkeley.

Tambet Matiisen

unread,

Apr 27, 2016, 2:39:44 AM4/27/16

to deep-q-...@googlegroups.com

On 27.04.2016 5:46, Jonathon Byrd wrote:

I've also made a DQN implementation in TensorFlow. My scores are on par with simple_dqn/deep_q_rl - I regularly get in the 350's (average of 30 games) on breakout. I also have support for double q-learning, gradient clipping, different optimizers (including DeepMind's version of RMSProp), and am still adding new things.

Looks great Jonathon! Very nice and compact implementation. Expressing networks in TensorFlow (or Theano) is a bit verbose for my taste, but clearly very flexible.

As far as speed goes, I made a parallel version that interacts with the emulator and learns simultaneously. Adjusted for hardware, it runs a little faster than simple_dqn right now with default RMSProp (something in my implementation of DeepMind's RMSProp is not optimized, so it is slightly slower than simple_dqn). Tambet also mentioned somewhere that he measured speed at the very start of the training process, when the exploration rate is still high, while the speedup from parallelization is more notable after the exploration rate settles.

Yes, I wanted to measure training speed and prediction speed separately, and in the beginning when it's mostly random the training time dominates. But simple_dqn still has a lot of room for improvement, it achieves only ~30% utilization on Titan X. The time is mostly spent on moving data between RAM and GPU. And it doesn't help that Neon uses HWCN data layout (other than having convolutions really fast).

What scores did you get on seaquest? I'm training on that game right now actually.

Tambet, what kind of scores are you getting from Space Invaders? Do you know how deep_q_rl does? My scores are lower than DeepMind's for that game too. Why do you think DeepMind's code still does a little better than deep_q_rl?

I haven't done much testing with simple_dqn beyond Pong and Breakout. Actually, I had some spare GPUs and set Seaquest and Space Invaders training right now. Also I haven't tested deep_q_rl myself, I just asked this list for numbers and Alejandro was kind enough to provide me his testing results. But he wasn't sure if these were achieved with the latest version. The numbers I'm referring to are here:

http://neuro.cs.ut.ee/deep-reinforcement-learning-with-neon/#docs-internal-guid-8923c5db-b58b-5a41-e18f-ae2c059653d6

I know that OpenAI is also working on standardized evaluation environment for RL tasks. It should be published in coming days (or weeks). Maybe it will clarify the situation a bit.

There is this. Schulman is in OpenAI, but it seems to have come from Berkeley.

Great, I wasn't aware of this toolkit! But OpenAI gym mostly focuses on evaluation and there are no algorithms implemented from their side.

Tambet

Jonathon Byrd

unread,

Apr 27, 2016, 11:44:34 AM4/27/16

to Deep Q-Learning

Looks great Jonathon! Very nice and compact implementation. Expressing networks in TensorFlow (or Theano) is a bit verbose for my taste, but clearly very flexible.

Thanks! I used simple_dqn and deep_q_rl for guidance with a lot of things, since they are more clear than the Torch code.

Yes, I wanted to measure training speed and prediction speed separately, and in the beginning when it's mostly random the training time dominates. But simple_dqn still has a lot of room for improvement, it achieves only ~30% utilization on Titan X. The time is mostly spent on moving data between RAM and GPU. And it doesn't help that Neon uses HWCN data layout (other than having convolutions really fast).

My utilization is at 75% on my GTX 970, but it's a weaker card. And the serial version of deep_rl_ale is still slower than simple_dqn, even when adjusted for hardware.

I haven't done much testing with simple_dqn beyond Pong and Breakout. Actually, I had some spare GPUs and set Seaquest and Space Invaders training right now. Also I haven't tested deep_q_rl myself, I just asked this list for numbers and Alejandro was kind enough to provide me his testing results. But he wasn't sure if these were achieved with the latest version. The numbers I'm referring to are here:

Let me know when you get results on Seaquest and Space Invaders. My best testing run on space invaders was in the low 1500's, but DeepMind reported 1900-something. I'd really like to find out what's causing the difference.

MW

unread,

May 1, 2016, 2:07:17 AM5/1/16

to Deep Q-Learning

Regarding your other question about resource usage, then yes, Neon seems to use resources much more effectively. For example network snapshot in Neon takes 13MB, in Lua/Torcs 500MB.

Since simple_dqn w/ Neon is so memory efficient, is it possible to parallelize more training on GPU? and achieve better score result?

Reply all

Reply to author

Forward