First Dump of the Secret Code

Brian Tanner

unread,

Jul 14, 2008, 11:04:33 PM7/14/08

to rl-compet...@googlegroups.com

I've just dumped all of our secret private code from the secret 2008
repositories to the public Google Code Project:

http://rl-competition.googlecode.com/

I've put the distributed code (the stuff you guys downloaded) in /
trunk/public and all of the secret stuff in /trunk/private

The private code includes all of the scripts we used to generate the
proving and testing MDPs, all of our generalizations, all of our
parameter values, etc, etc.

It includes source code for some domains that we didn't use.

The includes source code for the leaderboards, the "phone home server"
ruby application that handled all of the results recording, the
proving application, etc, etc.

BE WARNED: The code is a total mess right now. We intend to clean it
up and document it to some degree. But, I wanted to have a snapshot
available of EXACTLY what we were working with in case it is important
for future reference.

Unforunately, by moving the code around into different relative
directories, some of the build scripts won't be putting things in the
right place. We should work on fixing that soon.

Some of the projects build with ant from the command line. Others are
NetBeans projects and will require NetBeans to build. Sorry. Have
patience.
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Jul 15, 2008, 11:07:29 PM7/15/08

to RL Competition Code

Uh, I can't find it anywhere. I don't see any files in this group and
no files in the googlecode link. I'm not terribly familiar with these
google tools, so perhaps it's something I'm missing or not doing
right?

Brian Tanner

unread,

Jul 15, 2008, 11:39:30 PM7/15/08

to rl-compet...@googlegroups.com

Yeah. Are you familiar with subversion (svn)? It's commandline tool
for interacting with source code repositories. If you are familiar
with it, the best thing to do is to "check out" the whole project:
svn checkout http://rl-competition.googlecode.com/svn/trunk/ rl-
competition-read-only

(Instructions are under the "source" tab of the google code project)

I will post a downloadable link at some point, but for now with the
code in such a messy state I don't want to make it easy for people to
get the whole codebase unless they are savvy with the tools.

If you want to just poke around, you can explore the code here:
http://code.google.com/p/rl-competition/source/browse

If I remember, you're looking for the Tetris event (formally called
tetrlais to avoid copyright issues).

The relevant places to look would be:

The environment:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/Tetrlais/Tetrlais.java?r=248

The state structure:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/Tetrlais/GameState.java?r=248

The Proving and Testing MDPs:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP0/TPMDP0.java
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP1/TPMDP1.java
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP2/TPMDP2.java
...
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP99/TPMDP99.java

MDPs 0...14 were used for proving, and 50...64 were used for testing.

The training MDPs are all here:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/GeneralizedTetris/GenTetrisParamData.java

Unfortunately I don't have lots of time to talk you through the finer
points right now... but if you keep asking questions via this list,
we'll be happy to answer them, and then we'll have a formal,
searchable record of the issues that come up.

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Jul 16, 2008, 5:58:10 PM7/16/08

to RL Competition Code

Excellent. Thanks. I'll have to check it out soon. If I have any
questions I'll be sure to contact you, but I think I have a fairly
good grip of the process.

On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> Yeah. Are you familiar with subversion (svn)? It's commandline tool
> for interacting with source code repositories. If you are familiar
> with it, the best thing to do is to "check out" the whole project:
> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl-
> competition-read-only
>
> (Instructions are under the "source" tab of the google code project)
>
> I will post a downloadable link at some point, but for now with the
> code in such a messy state I don't want to make it easy for people to
> get the whole codebase unless they are savvy with the tools.
>
> If you want to just poke around, you can explore the code here:http://code.google.com/p/rl-competition/source/browse
>
> If I remember, you're looking for the Tetris event (formally called
> tetrlais to avoid copyright issues).
>
> The relevant places to look would be:
>

> The environment:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
>
> The state structure:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
>
> The Proving and Testing MDPs:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...http://code.google.com/p/rl-competition/source/browse/trunk/private/e...http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
> ...http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

>
> MDPs 0...14 were used for proving, and 50...64 were used for testing.
>

> The training MDPs are all here:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

Sam Sarjant

unread,

Jul 25, 2008, 9:20:11 PM7/25/08

to RL Competition Code

Had a good poke around and things are coming together. One thing I'm
wondering about is how the random seed works. The random seed can be
set via env_set_random_seed(Random_seed key), but I'm not sure how the
Random_seed_key class works. Even looking at its brief code, I can't
tell how it sets the seed. Can it take a long like the java Random
class or does it somehow combine the 2 int parameters into a long?

On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> Yeah. Are you familiar with subversion (svn)? It's commandline tool
> for interacting with source code repositories. If you are familiar
> with it, the best thing to do is to "check out" the whole project:
> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl-
> competition-read-only
>
> (Instructions are under the "source" tab of the google code project)
>
> I will post a downloadable link at some point, but for now with the
> code in such a messy state I don't want to make it easy for people to
> get the whole codebase unless they are savvy with the tools.
>
> If you want to just poke around, you can explore the code here:http://code.google.com/p/rl-competition/source/browse
>
> If I remember, you're looking for the Tetris event (formally called
> tetrlais to avoid copyright issues).
>
> The relevant places to look would be:
>

> The environment:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
>
> The state structure:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
>
> The Proving and Testing MDPs:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...http://code.google.com/p/rl-competition/source/browse/trunk/private/e...http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
> ...http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
>

> MDPs 0...14 were used for proving, and 50...64 were used for testing.
>

> The training MDPs are all here:http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

Brian Tanner

unread,

Jul 25, 2008, 10:32:57 PM7/25/08

to rl-compet...@googlegroups.com

I'm not sure if your question is in general about env_set_random_key,
or about Tetris or another domain in particular.

In general, the environment designer decides how to encode, store, and
otherwise handle setting and getting of random seeds. We used a
generic structure so that people could do whatever made sense for
them. The idea of a "seed" is supposed to be abstract here, it's
"some piece of encoded information" that can be shared between the
environment and the experiment program. In some cases you could make
up the random seed, in others you probably want to have matching
env_get_random_seed and env_set_random_seed calls (ie: you can set the
random generator back to an earlier point).

Now that I've said that aloud it seems wrong, one major reason that we
have the seed was supposed to be so that agents could do repeated
sampling... but in the current system the agent can't call these
methods. Maybe I'm just tired, but I think you may have deliberately
or inadvertently pointed out a pretty big problem. Ahh well, moving
along.

In Tetris, env_get_random_seed does the following:

public Random_seed_key env_get_random_seed() {
if (allowSaveLoadSeed) {
Random_seed_key k = new Random_seed_key(2, 0);
long newSeed = gameState.getRandom().nextLong();
gameState.getRandom().setSeed(newSeed);
k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed);
k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
return k;
}
System.err.println("env_get_random_seed() called in: " +
getClass() + " but it is disabled");
return null;
}

I'll do some explaining:

- Create a new structure of type Random_seed_key that will pass some
encoding of the seed back to the experiment program. WSee're in Java,
so that key to the random number generator is given to us as a long.
We'll encode it into 2 ints, because we only have doubles and ints to
choose from.

- Generate a new random seed randomly (I don't know a way to get the
*current* state of the Java random number generator, so we will
actually set it to something new)

- Set the random seed of the Java random number generator that Tetris
is using to the new one we just created.

- Pack the new seed (long) into 2 ints.

- Return the data structure that holds the 2 ints.

So the way to think about this is you can call RL_get_random_seed from
the experiment program, which returns you an object that you can use
to go back to the same random generator state again later. You could
do this:

(The exact function names might not be right, but you should get the
idea...)
(All this happens in the experiment program)
currentSeed = RL_get_random_seed();
currentState = RL_get_state();

.... agent follows sequence of actions a0 a1 .... an, and along the
way observes o0 r0, o1 r1, ... on, rn ....

RL_set_random_seed(currentSeed)
RL_set_state(currentState)

If the agent were now to follow the same sequence of actions, it
should hopefully (not thoroughly tested in Tetris) see the same
observations and rewards as before.

This can be used for experiment programs to test several agents with
the *exact* same starting conditions.

Please let me know if that's clear (I'm afraid it might not be) and
ask more questions.

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Jul 26, 2008, 12:25:26 AM7/26/08

to RL Competition Code

Ah. I see it now. I forgot to look at the code in Tetrlias and it was
right in my face. That should solve that problem.

Now, I was wondering if the environment stores the agent performance
anywhere. I've found the EpisodeLogger object, which seems to be an
outputStream (?). I'm not totally sure. Does this just store the state
and actions taken by the agent?

I'm looking for something that stores the reward at each step so I can
produce graphs of performance.

Brian Tanner

unread,

Jul 26, 2008, 12:30:50 AM7/26/08

to rl-compet...@googlegroups.com

Hi again. The EpisodeLogger is some fanciness that we added for the
competition so we could playback episodes. It's probably not what
you're looking for.

The way RL-Glue works, what you really want is inside the experiment program.

If you look at the console trainers (experiment programs) in the
public part of the competition directory, you should see what I mean.
They allow you to either run one step at a time and accumulate the
reward yourself, or you can use the RL_episode methods to run whole
episodes and then get the rewards at the end.

I think the javaConsoleTrainer should help make this clear. But if
not, keep asking!

--
Brian Tanner
Ph.D Student
University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Aug 1, 2008, 12:52:32 AM8/1/08

to RL Competition Code

This is unrelated to the code and more to the competition: Did the
time that the agent took to calculate the next action matter in the
competition? My agent that I submitted was fast (and fairly effective)
but before the competition had ended I had created another agent that
would theoretically do better (uses 1-step look ahead), but was about
119x slower.

It's too late now, but would this slow agent (takes about 0.2 second
to compute a goal piece location) have been accepted without penalty?
If so, I might have got a better placing. Maybe it's better I don't
know, otherwise I'll regret not proving it beforehand.

I'm using this thread because I don't expect the RL-Competition forum
to be as regularly watched.

- Sam

On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote:
> Hi again. The EpisodeLogger is some fanciness that we added for the
> competition so we could playback episodes. It's probably not what
> you're looking for.
>
> The way RL-Glue works, what you really want is inside the experiment program.
>
> If you look at the console trainers (experiment programs) in the
> public part of the competition directory, you should see what I mean.
> They allow you to either run one step at a time and accumulate the
> reward yourself, or you can use the RL_episode methods to run whole
> episodes and then get the rewards at the end.
>
> I think the javaConsoleTrainer should help make this clear. But if
> not, keep asking!
>

Brian Tanner

unread,

Aug 1, 2008, 3:44:17 PM8/1/08

to rl-compet...@googlegroups.com

We had no computational or time restrictions on agents this year. So,
you could use a supercomputer or a laptop, and you could write a super
fast agent, or a slow one.

The only restriction would be that you could actually complete a
proving or testing run in a reasonable amount of time. For some
events, the number of steps was very high, and an approach that takes
a long time (as you describe) could take weeks or months to finish an
official run.

Try it out. The proving server is still online ;)

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Aug 1, 2008, 9:47:06 PM8/1/08

to RL Competition Code

Sadly, my side of proving server access is no longer available so I
won't be able to try it without contacting the Tech Support guys
again. From my own experiments, it does seem to be better, but only
slightly. Although, this is only coming from a single run consisting
of about half a million steps. I'll be able to derive better
statistical results in the end anyway.

> ...
>
> read more »

Sam Sarjant

unread,

Sep 1, 2008, 7:08:09 PM9/1/08

to RL Competition Code

Hi again. I've gained full control of the code and things are going
swimmingly. The following question concerns the Tetris domain.

I have a question about how the proving and testing was done. The
details on the website state that it was done over 10 MDPs (I assume
taken randomly from the 15 you stated previously) and done with 5
million steps per MDP.
I assume the testing run was done in much the same manner.

However, there is some overall ambiguity towards the scores and output
graph. Are the leaderboard scores an average of the 10 MDPs or a
total? And is the final performance graph an average (over 5 million
steps) or a continual concatenation of the MDP scores (x axis = 50
million steps)? It looks to be made up of 10 'chunks' and would
account for the fluctuations of the lines.

I'm just curious about this because I wish to emulate the results
myself.

> ...
>
> read more »

Brian Tanner

unread,

Sep 2, 2008, 11:23:16 AM9/2/08

to rl-compet...@googlegroups.com

Hi Sam.

I'm glad things are going well. Proving and testing were both done
on the proving MDPs... and I see that the jars aren't checked into
subversion. I'll do that now, they're here:
http://code.google.com/p/rl-competition/source/browse/#svn/trunk/
private/environments/Tetrlais/provingJars

Their source is in:
http://code.google.com/p/rl-competition/source/browse/#svn/trunk/
private/environments/Tetrlais/src/ <-- each proving MDP is in it's
own package (yech! We have solved the technical limitation which
required this for last year) :)

Anyways, we used a cheap hack for proving testing.

For proving, we used TPMDP[0...9] .

For testing, we just set the offset to and used the unspoiled proving
MDPS... PMDP[50...59].

We used the same trick for Mountain Car, and Polyathlon technically.

The leaderboard scores are always a function of the cumulative reward
in the proving run. For Tetris, it's just the cumulative reward. The
graphs that are on the website and that I showed at the workshop were
made by a program that Matt wrote... it does some fancy interpolation,
but basically yeah, it is just the cumulative results of all 10 MDPs.
I posted our official *final* results table to this list just a second
ago:
http://groups.google.com/group/rl-competition-code/browse_thread/thread/dbaefdebb326fdab

This should but a little more obvious to compare to.

I hope this helps. Keep the questions coming!

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Reply all

Reply to author

Forward