http://rl-competition.googlecode.com/
I've put the distributed code (the stuff you guys downloaded) in /
trunk/public and all of the secret stuff in /trunk/private
The private code includes all of the scripts we used to generate the
proving and testing MDPs, all of our generalizations, all of our
parameter values, etc, etc.
It includes source code for some domains that we didn't use.
The includes source code for the leaderboards, the "phone home server"
ruby application that handled all of the results recording, the
proving application, etc, etc.
BE WARNED: The code is a total mess right now. We intend to clean it
up and document it to some degree. But, I wanted to have a snapshot
available of EXACTLY what we were working with in case it is important
for future reference.
Unforunately, by moving the code around into different relative
directories, some of the build scripts won't be putting things in the
right place. We should work on fixing that soon.
Some of the projects build with ant from the command line. Others are
NetBeans projects and will require NetBeans to build. Sorry. Have
patience.
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com
(Instructions are under the "source" tab of the google code project)
I will post a downloadable link at some point, but for now with the
code in such a messy state I don't want to make it easy for people to
get the whole codebase unless they are savvy with the tools.
If you want to just poke around, you can explore the code here:
http://code.google.com/p/rl-competition/source/browse
If I remember, you're looking for the Tetris event (formally called
tetrlais to avoid copyright issues).
The relevant places to look would be:
The environment:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/Tetrlais/Tetrlais.java?r=248
The state structure:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/Tetrlais/GameState.java?r=248
The Proving and Testing MDPs:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP0/TPMDP0.java
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP1/TPMDP1.java
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP2/TPMDP2.java
...
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/TPMDP99/TPMDP99.java
MDPs 0...14 were used for proving, and 50...64 were used for testing.
The training MDPs are all here:
http://code.google.com/p/rl-competition/source/browse/trunk/private/environments/Tetrlais/src/GeneralizedTetris/GenTetrisParamData.java
Unfortunately I don't have lots of time to talk you through the finer
points right now... but if you keep asking questions via this list,
we'll be happy to answer them, and then we'll have a formal,
searchable record of the issues that come up.
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com
In general, the environment designer decides how to encode, store, and
otherwise handle setting and getting of random seeds. We used a
generic structure so that people could do whatever made sense for
them. The idea of a "seed" is supposed to be abstract here, it's
"some piece of encoded information" that can be shared between the
environment and the experiment program. In some cases you could make
up the random seed, in others you probably want to have matching
env_get_random_seed and env_set_random_seed calls (ie: you can set the
random generator back to an earlier point).
Now that I've said that aloud it seems wrong, one major reason that we
have the seed was supposed to be so that agents could do repeated
sampling... but in the current system the agent can't call these
methods. Maybe I'm just tired, but I think you may have deliberately
or inadvertently pointed out a pretty big problem. Ahh well, moving
along.
In Tetris, env_get_random_seed does the following:
public Random_seed_key env_get_random_seed() {
if (allowSaveLoadSeed) {
Random_seed_key k = new Random_seed_key(2, 0);
long newSeed = gameState.getRandom().nextLong();
gameState.getRandom().setSeed(newSeed);
k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed);
k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
return k;
}
System.err.println("env_get_random_seed() called in: " +
getClass() + " but it is disabled");
return null;
}
I'll do some explaining:
- Create a new structure of type Random_seed_key that will pass some
encoding of the seed back to the experiment program. WSee're in Java,
so that key to the random number generator is given to us as a long.
We'll encode it into 2 ints, because we only have doubles and ints to
choose from.
- Generate a new random seed randomly (I don't know a way to get the
*current* state of the Java random number generator, so we will
actually set it to something new)
- Set the random seed of the Java random number generator that Tetris
is using to the new one we just created.
- Pack the new seed (long) into 2 ints.
- Return the data structure that holds the 2 ints.
So the way to think about this is you can call RL_get_random_seed from
the experiment program, which returns you an object that you can use
to go back to the same random generator state again later. You could
do this:
(The exact function names might not be right, but you should get the
idea...)
(All this happens in the experiment program)
currentSeed = RL_get_random_seed();
currentState = RL_get_state();
.... agent follows sequence of actions a0 a1 .... an, and along the
way observes o0 r0, o1 r1, ... on, rn ....
RL_set_random_seed(currentSeed)
RL_set_state(currentState)
If the agent were now to follow the same sequence of actions, it
should hopefully (not thoroughly tested in Tetris) see the same
observations and rewards as before.
This can be used for experiment programs to test several agents with
the *exact* same starting conditions.
Please let me know if that's clear (I'm afraid it might not be) and
ask more questions.
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com
The way RL-Glue works, what you really want is inside the experiment program.
If you look at the console trainers (experiment programs) in the
public part of the competition directory, you should see what I mean.
They allow you to either run one step at a time and accumulate the
reward yourself, or you can use the RL_episode methods to run whole
episodes and then get the rewards at the end.
I think the javaConsoleTrainer should help make this clear. But if
not, keep asking!
--
Brian Tanner
Ph.D Student
University of Alberta
br...@tannerpages.com
The only restriction would be that you could actually complete a
proving or testing run in a reasonable amount of time. For some
events, the number of steps was very high, and an approach that takes
a long time (as you describe) could take weeks or months to finish an
official run.
Try it out. The proving server is still online ;)
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com