Re: Keras implementation closely following deep_q_rl implementation doesn't learn

96 views
Skip to first unread message
Message has been deleted

Tambet Matiisen

unread,
Mar 9, 2018, 8:08:38 AM3/9/18
to deep-q-...@googlegroups.com

There are several working Keras DQN implementations on the internet, search for "keras dqn". For example this one seems pretty straightforward: https://keon.io/deep-q-learning/. Although it supports only simple CartPole environment, it should be easy to modify it for Atari.

  Tambet


On 08.03.2018 19:57, Dibya Chakravorty wrote:
I have been trying to replicate the NIPS (2013) results using a Keras implementation. I use the same hyperparameters as in the deep_q_rl NIPS implementation. But so far, the agent doesn't seem to learn at all. The Q values and rewards do not increase in Pong and Breakout, which are the two games I have tried so far.

After extensive attempts at debugging for over a month, I am now starting to question whether Keras is at the root of the problem, as other parts of the code seems to be in line with deep_q_rl and the NIPS paper. In particular, I suspect that the problem might be with the way Keras implements RMSProp. But I am not sure if this is the actual reason.

Has anyone here tried to implement DQN using Keras? If yes, was the implementation successful at learning?
--
You received this message because you are subscribed to the Google Groups "Deep Q-Learning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deep-q-learni...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deep-q-learning/1a61a88f-1907-4039-9a4b-3b4e15f83532%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted

Neal Schlatter Jr

unread,
Mar 14, 2018, 12:01:57 PM3/14/18
to Deep Q-Learning
I am also having difficulty training my network to learn from ALE using Keras.  I am trying to model my network after the NIPS implementation, but it may not be the same.  It currently uses Adam.  I was able to train it (with good results) to play a simple game of catch where a ball drops from a random position and the paddle "catches" the dropping ball.  I am trying to implement ALE as the learning environment but the Q value is always the same no matter what the input when I test the network.

--Neal
Message has been deleted

Neal Schlatter Jr

unread,
Mar 16, 2018, 2:55:43 PM3/16/18
to Deep Q-Learning
Hi Dibya,

Sorry for the late reply.  I am using 3 convolutional layers and 2 fully connected layers to model the Q-function.  The input shape is (80, 80, 4).  The output shape is (4), corresponding to the actions I found possible for Breakout.  Does that sound like what you are using for the output shape?    

All layers, except the last are using the ReLU activation function.

I am using mean squared error for the loss function.  I am using the Adam optimizer with a low learning rate (lr=1e-6).

model.compile(optimizer=Adam(lr=1e-6), loss="mse")

Here is my experience replay queue:

MEMORY_SIZE = 750000
experience = collections.deque(maxlen=MEMORY_SIZE)


--Neal


On Wednesday, March 14, 2018 at 1:04:55 PM UTC-4, Dibya Chakravorty wrote:
Hi Neal. Perhaps we can help each other and figure out what's going wrong in our code. I have been at this for more than a month now - still no luck. We can discuss further over Skype if you want.

The NIPS architecture has 2 convolutional layers with 16 and 32 filters, followed by a fully connected layer with 256 units, followed by a fully connected layer for the Q values. It uses experience replay but no target networks. Is this your architecture as well?

--
You received this message because you are subscribed to a topic in the Google Groups "Deep Q-Learning" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/deep-q-learning/8-7ECtWIyX0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to deep-q-learni...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deep-q-learning/eba6eaf3-f7d0-4679-856e-984714d71e9f%40googlegroups.com.
Message has been deleted

Migdalin HasNoLastName

unread,
Jan 4, 2019, 9:41:17 AM1/4/19
to Deep Q-Learning
The keon code appears to be a dead end.  It includes a "ddqn" file, which isn't even a true dqn, never mind ddqn.  There aren't any convolutional layers in the model, nor do I see separate training vs. online models, or a value vs. advantage dueling network, which would make it especially strange to call that a ddqn.  That example also falls prey to performing the batch update inside a for loop.  I haven't tried that code against any of the atari games.  If it does well, it deserves a Nature paper of its own, and everybody else has been making things way more complicated than they need to be.

OpenAI has posted a set of benchmark algorithms that include DQN and some variants.  Unfortunately, their stuff only supports Ubuntu and appears to require MuJoCo, so the best I can do is admire the intricately written code, which is way, way beyond me.

Here are a couple of links that I've found helpful.  There are a lot of implementations out there, though I've only found one so far has had a combination of (a) keras implementation, (b) a minimum of extra framework, (c) complete example available via github.

Reply all
Reply to author
Forward
0 new messages