Question about Space Invaders different action repeat/histroy length in DQN

748 views
Skip to first unread message

Jonathon Byrd

unread,
Feb 8, 2016, 12:11:26 PM2/8/16
to Deep Q-Learning
From the NIPS paper:


More precisely, the agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames.  ...We use k = 4 for all games except Space Invaders where we noticed that using k= 4 makes the lasers invisible because of the period at which they blink. We used k = 3 to make the lasers visible and this change was the only difference in hyperparameter values between any of the games.

I'm confused about how shortening the history length makes the lasers visible.  For the other games, the last 4 frames generated by the emulator are used as the input to the network, right?  How could reducing the input make previously invisible objects visible?  I can understand it giving the agent finer control over its actions to dodge the lasers, but not to see them.  If the lasers blinked every 4 frames, then using 3 as the history length would prevent two lasers from appearing in one history (because we max over the last two frames to create each input image).  But they should still be visible either way, right?  What am I missing?

Thanks for the help!

Islandman93

unread,
Feb 11, 2016, 7:08:30 PM2/11/16
to Deep Q-Learning
Jonathon, 

You fell into the same trap I did when reading the NIPS paper (tbh it's not very well described). I too believed that frame skip corresponded to the history shown to the learner, but this is incorrect. The skip frame actually means that those frames are totally lost (except with the max over the last two). Now I read it as the agent only sees every n frames, in this case I believe the processing function is part of the agent and therefore it also only sees a frame every n frames. So the processing function is actually creating a 4 channel image over the last 4*n frames selecting one at each skip frame. This also applies to the frames stored in the experience replay.

I actually created the original version of learningALE in the way you described, seeing every frame but only selecting an action every k frames. It wasn't until I was carefully looking through spragunr's code that I realized my mistake. 

Jonathon Byrd

unread,
Feb 11, 2016, 8:14:40 PM2/11/16
to Deep Q-Learning
Thanks Islandman!

That's what it looked like spragunr's deep_q_rl was doing.  In the Nature paper it says something like the agent "takes as input the last m frames", or something like that, which seemed to imply the opposite, so I wasn't quite sure.  I've made my own version of DQN as well, and this may be why it learns so differently.  On breakout it reached an an average testing score of 130 at epoch 10, but then dropped down to averages between 5 and 30 by epoch 17 or so.  It still works well on Pong though.

Do you know why DeepMind decided to do it this way?  I guess they were just looking for the highest performance?  Why do you think this method works better?

Islandman93

unread,
Feb 12, 2016, 8:34:27 AM2/12/16
to Deep Q-Learning
Jonathon,

It'd be interesting to see if you get better results or at least more consistent performance after updating your code, let me know how it goes. 

1) My estimate is that it gives better performance because the network can see further back in time. It seems like a good solution, to me at least, to give the network as much information as possible without overloading it. By implementing the frame skip so the network can see up to 12 frames ago I think you maximize the information to the network.

You mention performance, frame skip can have a HUGE impact on performance. Check out this paper http://nn.cs.utexas.edu/downloads/papers/braylan.aaai15.pdf where they examine (to the extreme) the positive and negative effects of frame skip in the same 6 NIPS games. This paper does use evolutionary algorithms instead of DQN, but they beat the NIPS performance in a couple of games by having what would seem like a crippling amount of frame skip. Notice Figure 1 shows that Seaquest gets a large performance gain with 180 frame skip. That's 3 seconds of game time!

Islandman

Jonathon Byrd

unread,
Feb 13, 2016, 8:23:53 PM2/13/16
to Deep Q-Learning
The idea from that paper of having the agent choose its own frame skip parameter sounds very interesting.
Reply all
Reply to author
Forward
0 new messages