Re: Keras implementation closely following deep_q

Message has been deleted

Tambet Matiisen

unread,

Mar 9, 2018, 8:08:38 AM3/9/18

to deep-q-...@googlegroups.com

There are several working Keras DQN implementations on the internet, search for "keras dqn". For example this one seems pretty straightforward: https://keon.io/deep-q-learning/. Although it supports only simple CartPole environment, it should be easy to modify it for Atari.

Tambet

On 08.03.2018 19:57, Dibya Chakravorty wrote:

I have been trying to replicate the NIPS (2013) results using a Keras implementation. I use the same hyperparameters as in the deep_q_rl NIPS implementation. But so far, the agent doesn't seem to learn at all. The Q values and rewards do not increase in Pong and Breakout, which are the two games I have tried so far.

After extensive attempts at debugging for over a month, I am now starting to question whether Keras is at the root of the problem, as other parts of the code seems to be in line with deep_q_rl and the NIPS paper. In particular, I suspect that the problem might be with the way Keras implements RMSProp. But I am not sure if this is the actual reason.

Has anyone here tried to implement DQN using Keras? If yes, was the implementation successful at learning?

--
You received this message because you are subscribed to the Google Groups "Deep Q-Learning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deep-q-learni...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deep-q-learning/1a61a88f-1907-4039-9a4b-3b4e15f83532%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Neal Schlatter Jr

unread,

Mar 14, 2018, 12:01:57 PM3/14/18

to Deep Q-Learning

I am also having difficulty training my network to learn from ALE using Keras. I am trying to model my network after the NIPS implementation, but it may not be the same. It currently uses Adam. I was able to train it (with good results) to play a simple game of catch where a ball drops from a random position and the paddle "catches" the dropping ball. I am trying to implement ALE as the learning environment but the Q value is always the same no matter what the input when I test the network.

--Neal

Message has been deleted

Neal Schlatter Jr

unread,

Mar 16, 2018, 2:55:43 PM3/16/18

to Deep Q-Learning

Hi Dibya,

Sorry for the late reply. I am using 3 convolutional layers and 2 fully connected layers to model the Q-function. The input shape is (80, 80, 4). The output shape is (4), corresponding to the actions I found possible for Breakout. Does that sound like what you are using for the output shape?

All layers, except the last are using the ReLU activation function.

I am using mean squared error for the loss function. I am using the Adam optimizer with a low learning rate (lr=1e-6).

model.compile(optimizer=Adam(lr=1e-6), loss="mse")

Here is my experience replay queue:

MEMORY_SIZE = 750000

experience = collections.deque(maxlen=MEMORY_SIZE)

--Neal

On Wednesday, March 14, 2018 at 1:04:55 PM UTC-4, Dibya Chakravorty wrote:

Hi Neal. Perhaps we can help each other and figure out what's going wrong in our code. I have been at this for more than a month now - still no luck. We can discuss further over Skype if you want.

The NIPS architecture has 2 convolutional layers with 16 and 32 filters, followed by a fully connected layer with 256 units, followed by a fully connected layer for the Q values. It uses experience replay but no target networks. Is this your architecture as well?

--
You received this message because you are subscribed to a topic in the Google Groups "Deep Q-Learning" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/deep-q-learning/8-7ECtWIyX0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to deep-q-learni...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/deep-q-learning/eba6eaf3-f7d0-4679-856e-984714d71e9f%40googlegroups.com.

Message has been deleted

Migdalin HasNoLastName

unread,

Jan 4, 2019, 9:41:17 AM1/4/19

to Deep Q-Learning

The keon code appears to be a dead end. It includes a "ddqn" file, which isn't even a true dqn, never mind ddqn. There aren't any convolutional layers in the model, nor do I see separate training vs. online models, or a value vs. advantage dueling network, which would make it especially strange to call that a ddqn. That example also falls prey to performing the batch update inside a for loop. I haven't tried that code against any of the atari games. If it does well, it deserves a Nature paper of its own, and everybody else has been making things way more complicated than they need to be.

OpenAI has posted a set of benchmark algorithms that include DQN and some variants. Unfortunately, their stuff only supports Ubuntu and appears to require MuJoCo, so the best I can do is admire the intricately written code, which is way, way beyond me.

Here are a couple of links that I've found helpful. There are a lot of implementations out there, though I've only found one so far has had a combination of (a) keras implementation, (b) a minimum of extra framework, (c) complete example available via github.

https://towardsdatascience.com/tutorial-double-deep-q-learning-with-dueling-network-architectures-4c1b3fb7f756

https://yilundu.github.io/2016/12/24/Deep-Q-Learning-on-Space-Invaders.html

Reply all

Reply to author

Forward

Re: Keras implementation closely following deep_q_rl implementation doesn't learn

Tambet Matiisen

Neal Schlatter Jr

Neal Schlatter Jr

Migdalin HasNoLastName