Changes on my fork

Alejandro Dubrovsky

unread,

Jan 15, 2015, 9:29:02 PM1/15/15

to deep-q-...@googlegroups.com

I've got quite a few changes on my fork of the code at https://github.com/alito/deep_q_rl . They are probably too intrusive to be pulled wholesale, but I could make selective diffs of some parts if they are wanted.

1) Changed the patch to ALE to emit a full RGB screen, instead of the downsampled greyscaled version. rl_glue_age_agent.py then does the downsampling and greyscaling. I do this for the sake of purity and flexibility. I like the concept of the emulator side doing the least amount of learning-related work. It's a patch that should probably go upstream to ALE, since by default they emit the palette numbers, which makes no sense to me, but it'd break backwards compatibility for them so they wouldn't be likely to accept it. This patch probably makes it slightly slower but the effect is negligible since we are still GPU-bound

2) RGB to greyscale conversion uses the human-perception adjusted formula instead of the average of RGB. This, again, not important for learning, but adds a bit of purity.

3) Downsample to 110x84 and then crop the bottom 84x84 square instead of downsampling to 105x80 and then squeeze to 80x80. This is so that it matches what they did in the paper.

4) Lots of stylistic changes in the code. These are the parts that are too intrusive and opinionated (eg expanding shortened variable names, getting rid of leading underscores in methods, splitting up big methods into multiple methods), but some of these are definitely helpful (eg adding more command-line parameters to run_ale, input parameters for plot_results.py)

5) Limiting the number of neurons in the output layer to the number of available outputs for the hardware used for the game. eg for breakout, a paddle is used, so limit the available actions to left, neutral, right, left-button, neutral-button and right-button. ie you cannot go up with a paddle, so remove them from the network. This seems like a "fair" change since a human cannot even try to go up with a paddle.

6) Changed the default discount rate to 0.95

7) Save the frames from the best game played in each testing epoch. Handy to make videos with afterwards. Also save the best scores attained in each epoch to results.csv

If you are after changes that improve the learning in breakout, grab (or reimplement) changes 5 and 6. I've got the average scores to hover near 80 like that with a couple of epochs averaging over 100, and quite a few 200+ scores.

A default rate of 0.97 learns even faster but "crashes" even faster too. By crashing I mean that the scores collapse to zero and the Q-values shoot up to stupid levels. I think this is just the Q-learning algorithm diverging.

Ajay Talati

unread,

Jan 16, 2015, 5:52:47 AM1/16/15

to deep-q-...@googlegroups.com

Cool! Thanks a lot for sharing!

I'm running steps 5) and 6) now.

# We select the bottom part since that's where the action happens
# This makes a mockery of the bit above that is meant to be all general'n'shit

Nice comment :)

Nathan Sprague

unread,

Jan 16, 2015, 1:56:28 PM1/16/15

to deep-q-...@googlegroups.com

Thanks for sharing your work! I would definitely like to get some of this into the main repository. See below for my specific thoughts.

I'm pretty busy right now, but if I can grab a couple of free hours this weekend I may try to bring in some of these improvements. If I don't get to it, I'll take you up on the offer to create some focused diffs of the most important bits.

On Thursday, January 15, 2015 at 9:29:02 PM UTC-5, Alejandro Dubrovsky wrote:

I've got quite a few changes on my fork of the code at https://github.com/alito/deep_q_rl . They are probably too intrusive to be pulled wholesale, but I could make selective diffs of some parts if they are wanted.

1) Changed the patch to ALE to emit a full RGB screen, instead of the downsampled greyscaled version. rl_glue_age_agent.py then does the downsampling and greyscaling. I do this for the sake of purity and flexibility. I like the concept of the emulator side doing the least amount of learning-related work. It's a patch that should probably go upstream to ALE, since by default they emit the palette numbers, which makes no sense to me, but it'd break backwards compatibility for them so they wouldn't be likely to accept it. This patch probably makes it slightly slower but the effect is negligible since we are still GPU-bound

I agree that this approach makes more sense, and also that the changes should really go upstream to ALE. They probably wouldn't accept something that changes the default behavior, but they might be open to enabling this as an option... Of course, that's more work. Doing it right would require making changes to all of the ALE interfaces, not just RLGlue. For our purposes, I'm willing to pull this in if it can be done easily, but it isn't something I feel strongly about.

2) RGB to greyscale conversion uses the human-perception adjusted formula instead of the average of RGB. This, again, not important for learning, but adds a bit of purity.

I don't have strong feelings about this either way.

3) Downsample to 110x84 and then crop the bottom 84x84 square instead of downsampling to 105x80 and then squeeze to 80x80. This is so that it matches what they did in the paper.

I agree that this should be the default.

4) Lots of stylistic changes in the code. These are the parts that are too intrusive and opinionated (eg expanding shortened variable names, getting rid of leading underscores in methods, splitting up big methods into multiple methods), but some of these are definitely helpful (eg adding more command-line parameters to run_ale, input parameters for plot_results.py)

Sounds good. (I'm open to style suggestions, but of course more interested in the functional improvements.)

5) Limiting the number of neurons in the output layer to the number of available outputs for the hardware used for the game. eg for breakout, a paddle is used, so limit the available actions to left, neutral, right, left-button, neutral-button and right-button. ie you cannot go up with a paddle, so remove them from the network. This seems like a "fair" change since a human cannot even try to go up with a paddle.

I think that this is something that should be an option on the ALE side. ALE already has functionality that does something like this. Each ALE game stores a "minimal action set" that can be accessed within the ALE code. I don't think the RLGlue interface exposes that, but it would probably be straightforward to modify the RLGlue code so that it uses that information. (Paddle vs. Joystick isn't exaclty the same as minimal vs. full, but the impulse is the same.) I don't see this as a priority, in part because it's a step away from reproducing the results in the original paper.

6) Changed the default discount rate to 0.95

I'll definitely make this change. I've asked a student to run a systematic parameter sweep across learning rate and discount. Hopefully that will allow for better default parameter settings.

7) Save the frames from the best game played in each testing epoch. Handy to make videos with afterwards. Also save the best scores attained in each epoch to results.csv

Sounds good! As long as saving the frames is optional.

Alejandro Dubrovsky

unread,

Jan 17, 2015, 11:07:39 PM1/17/15

to deep-q-...@googlegroups.com

On 17/01/15 05:56, Nathan Sprague wrote:
>
> I'm pretty busy right now, but if I can grab a couple of free hours this
> weekend I may try to bring in some of these improvements. If I don't
> get to it, I'll take you up on the offer to create some focused diffs of
> the most important bits.

Let me know how you go. I can create diffs for whichever part you are
interested in. More thoughts below

>
> On Thursday, January 15, 2015 at 9:29:02 PM UTC-5, Alejandro Dubrovsky
> wrote:
>

[Changing to full RGB]

>
>
> I agree that this approach makes more sense, and also that the changes
> should really go upstream to ALE. They probably wouldn't accept
> something that changes the default behavior, but they might be open to
> enabling this as an option... Of course, that's more work. Doing it
> right would require making changes to all of the ALE interfaces, not
> just RLGlue. For our purposes, I'm willing to pull this in if it can be
> done easily, but it isn't something I feel strongly about.
>

This change is trivial as is. Getting it accepted upstream would make it
easier to set deep_q_rl up. I'll try to write a proper patch for ALE,
see if they like it.

>
> 5) Limiting the number of neurons in the output layer to the number
> of available outputs for the hardware used for the game. eg for
> breakout, a paddle is used, so limit the available actions to left,
> neutral, right, left-button, neutral-button and right-button. ie you
> cannot go up with a paddle, so remove them from the network. This
> seems like a "fair" change since a human cannot even try to go up
> with a paddle.
>
> I think that this is something that should be an option on the ALE
> side. ALE already has functionality that does something like this.
> Each ALE game stores a "minimal action set" that can be accessed within
> the ALE code. I don't think the RLGlue interface exposes that, but it
> would probably be straightforward to modify the RLGlue code so that it
> uses that information. (Paddle vs. Joystick isn't exaclty the same as
> minimal vs. full, but the impulse is the same.) I don't see this as a
> priority, in part because it's a step away from reproducing the results
> in the original paper.

I agree that this should be an ALE change.
From my reading of the original paper, I think the Deep Mind people did
make this change. At the end of section 4.1 Preprocessing and Model
Architecture, it says:

"The output layer is a fully-connected linear layer with a single output
for each valid action. The number of valid actions varied
between 4 and 18 on the games we considered."

Since ALE always reports 18 inputs as far as I can tell, they must have
done it manually.

>
> 7) Save the frames from the best game played in each testing epoch.
> Handy to make videos with afterwards. Also save the best scores
> attained in each epoch to results.csv
>
> Sounds good! As long as saving the frames is optional.

It isn't at the moment. I'll make it optional.

Also, thank you very much for your code. It's the project I'm most
excited about at the moment and the one I wake up in the morning with
ideas about. (Apologies to my day job).

alejandro

Ajay Talati

unread,

Jan 18, 2015, 4:10:44 AM1/18/15

to deep-q-...@googlegroups.com

Maybe there's already an ALE method for minimal actions? I think the last two functions of this code might be useful?

`https://github.com/fidlej/alewrap/blob/master/alewrap/alewrap.cpp`

The author works at Deepmind?

Alejandro Dubrovsky

unread,

Jan 18, 2015, 4:55:46 AM1/18/15

to deep-q-...@googlegroups.com

On 18/01/15 20:10, Ajay Talati wrote:
> Maybe there's already an ALE method for minimal actions? I think the
> last two functions of this code might be useful?

Yes, the code is there on ALE, it just doesn't export that information
through the RL_glue interface.

>
> `https://github.com/fidlej/alewrap/blob/master/alewrap/alewrap.cpp`
>
> The author works at Deepmind?
>

Interesting. It seems like they use Torch.

Ajay Talati

unread,

Jan 18, 2015, 5:42:37 AM1/18/15

to deep-q-...@googlegroups.com

Torch tensors are really interesting. Say for example you had to process motor movements, 3D images, multiple source sound, touch, as well as text, all steaming at the same time. A torch tensor could handle that.

Alejandro Dubrovsky

unread,

Jan 19, 2015, 5:09:43 AM1/19/15

to deep-q-...@googlegroups.com

On 17/01/15 05:56, Nathan Sprague wrote:

> 6) Changed the default discount rate to 0.95
>
>
> I'll definitely make this change. I've asked a student to run a
> systematic parameter sweep across learning rate and discount. Hopefully
> that will allow for better default parameter settings.

BTW, it looks like I never actually made this change. I just keep
running my main ale_run.py with -d 0.95. It passes it to the agent, but
the default stayed at 0.9. Sorry for the confusion.

Ajay Talati

unread,

Jan 19, 2015, 10:05:56 AM1/19/15

to deep-q-...@googlegroups.com

"Each ALE game stores a "minimal action set" that can be accessed within the ALE code."

Working on the c++ interface to ALE, for some games a call to the method

`ale.getLegalActionSet()`

cause two of my three ROMs to crash? I can only get breakout to work with this. Both Freeway and Pong give the error

`Unsupported ROM file`

Both ROMs work fine with Nathans code/RL-Glue? Would be nice to be able to get these working (find their minimal action sets), as breakout is the only game I can experiment/learn about the c++ interface with.

Freeway.bin

Pong.bin

breakout.bin

Ajay Talati

unread,

Jan 19, 2015, 11:07:39 AM1/19/15

to deep-q-...@googlegroups.com

Got to be the most annoying thing about ALE!

If you want to use ROMs with the c++ interface, i.e. without RL-Glue, the file names have to be all lowercase. Maybe there's some conversion, to lower case done by RL-Glue/Stella. The attached ROMs are fine if you rename them, i.e. freeway.bin and pong.bin. What a waste of a few hours?

Nathan, maybe you should post something on your Git repo about this, i.e. if your ROMs are not working its worth experimenting with the filename's, i.e. try all lower case. Could save other folks some time?

Anyhow - my mistake - there's no problem with the

`ale.getLegalActionSet()`

method

...

Nathan Sprague

unread,

Jan 19, 2015, 2:43:16 PM1/19/15

to deep-q-...@googlegroups.com

I'm glad the code has been useful for you! In terms of the changes we've been discussing, I need to put it all on the back burner for a couple weeks. I still want to merge these improvements, but I have a deadline looming. I've done something to address the minimal actions issue. I'll post that to a new thread since this one is very cluttered.

Ajay Talati

unread,

Feb 6, 2015, 9:17:32 AM2/6/15

to deep-q-...@googlegroups.com

Hi Alejandro,

I was just wondering if you could give me some advice how to get your fork working? I've pulled it fresh, and made a separate build of ALE using your rlglue_controller.cpp file.

Just wondered if you have any ideas?

P.S. I like all the additions you've made to the code :)))

/usr/bin/python2.7 /home/ajay/PythonProjects/deep_q_rl-master_alito/deep_q_rl/ale_run.py

RL-Glue Version 3.04, Build 909

A.L.E: Arcade Learning Environment (version 0.4)

[Powered by Stella]

Use -help for help screen.

Warning: couldn't load settings file: ./stellarc

Game console created:

ROM file: /home/ajay/bin/roms/breakout.bin

Cart Name: Breakout - Breakaway IV (1978) (Atari)

Cart MD5: f34f08e5eb96e500e851a80be3277a56

Display Format: AUTO-DETECT ==> NTSC

ROM Size: 2048

Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...

Random Seed: Time

Game will be controlled through RL-Glue.

RL-Glue Python Experiment Codec Version: 2.02 (Build 738)

Connecting to 127.0.0.1 on port 4096...

Initializing ALE RL-Glue ...

Using gpu device 0: GeForce GTX 570

INFO:root:Experiment directory: breakout_2015-02-06-13-46_0p0002_0p95

INFO:root:Task spec: VERSION RL-Glue-3.0 PROBLEMTYPE episodic DISCOUNTFACTOR 1 OBSERVATIONS INTS (100800 0 255) ACTIONS INTS (0 5) REWARDS (UNSPEC UNSPEC) EXTRA Name: Arcade Learning Environment

RL-Glue Python Agent Codec Version: 2.02 (Build 738)

Connecting to 127.0.0.1 on port 4096...

Agent Codec Connected

INFO:root:Layer 1: (32, 4, 84, 84)

INFO:root:Layer 2: (4, 84, 84, 32)

INFO:root:Layer 3: (16, 20.0, 20.0, 32)

INFO:root:Layer 4: (32, 9.0, 9.0, 32)

INFO:root:Layer 5: (32, 32, 9.0, 9.0)

INFO:root:Layer 6: (32, 256)

INFO:root:Layer 7: (32, 6)

/home/ajay/bin/Theano-master/theano/gof/cmodule.py:289: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility

rval = __import__(module_name, {}, {}, [module_name])

INFO:root:OPENING breakout_2015-02-06-13-46_0p0002_0p95/results.csv

INFO:root:Cropping at 19

INFO:root:Received start_epoch 1

INFO:root:training epoch: 1 steps_left: 50000

Traceback (most recent call last):

File "./rl_glue_ale_agent.py", line 670, in <module>

sys.exit(main(sys.argv[1:]))

File "./rl_glue_ale_agent.py", line 666, in main

max_history=parameters.max_history))

File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent

client.runAgentEventLoop()

File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop

switch[agentState](self)

File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 138, in <lambda>

Network.kAgentStart: lambda self: self.onAgentStart(),

File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 51, in onAgentStart

action = self.agent.agent_start(observation)

File "./rl_glue_ale_agent.py", line 328, in agent_start

self.last_image, raw_image = self.preprocess_observation(observation.intArray)

File "./rl_glue_ale_agent.py", line 341, in _preprocess_observation_cropped_by_cv

image = observation[128:].reshape(IMAGE_HEIGHT, IMAGE_WIDTH, 3)

ValueError: total size of new array must be unchanged

INFO:root:training epoch: 1 steps_left: 49995

INFO:root:training epoch: 1 steps_left: 49993

INFO:root:training epoch: 1 steps_left: 49991

Alejandro Dubrovsky

unread,

Feb 6, 2015, 9:44:10 AM2/6/15

to deep-q-...@googlegroups.com

On 07/02/15 01:17, Ajay Talati wrote:
> Hi Alejandro,
>
> I was just wondering if you could give me some advice how to get your
> fork working? I've pulled it fresh, and made a separate build of ALE
> using your rlglue_controller.cpp file.
>
> Just wondered if you have any ideas?
>

Ah yes, sorry, I should have changed the README. You don't need to patch
ALE with my fork. Just use the latest ALE from github. I'll remove the
rl_glue_controller.cpp.

> P.S. I like all the additions you've made to the code :)))

Thanks!

Ajay Talati

unread,

Feb 6, 2015, 2:14:28 PM2/6/15

to deep-q-...@googlegroups.com

Thanks :)

It's working, and I've set DefaultTestingEpsilon = 0.05

INFO:root:OPENING breakout_2015-02-06-19-03_0p0002_0p95/results.csv

INFO:root:Cropping at 19

INFO:root:Received start_epoch 1

INFO:root:training epoch: 1 steps_left: 50000

INFO:root:Simulated at a rate of 45.9058687269 frames/s (39.0845147393 batches/s)

Average loss: 0.0984483216247

INFO:root:training epoch: 1 steps_left: 49751

INFO:root:Simulated at a rate of 41.1518466318 frames/s (40.8863508471 batches/s