I've put the distributed code (the stuff you guys downloaded) in / trunk/public and all of the secret stuff in /trunk/private
The private code includes all of the scripts we used to generate the proving and testing MDPs, all of our generalizations, all of our parameter values, etc, etc.
It includes source code for some domains that we didn't use.
The includes source code for the leaderboards, the "phone home server" ruby application that handled all of the results recording, the proving application, etc, etc.
BE WARNED: The code is a total mess right now. We intend to clean it up and document it to some degree. But, I wanted to have a snapshot available of EXACTLY what we were working with in case it is important for future reference.
Unforunately, by moving the code around into different relative directories, some of the build scripts won't be putting things in the right place. We should work on fixing that soon.
Some of the projects build with ant from the command line. Others are NetBeans projects and will require NetBeans to build. Sorry. Have patience. -- Brian Tanner Ph.D Student, University of Alberta br...@tannerpages.com
Uh, I can't find it anywhere. I don't see any files in this group and
no files in the googlecode link. I'm not terribly familiar with these
google tools, so perhaps it's something I'm missing or not doing
right?
On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote:
> I've put the distributed code (the stuff you guys downloaded) in /
> trunk/public and all of the secret stuff in /trunk/private
> The private code includes all of the scripts we used to generate the
> proving and testing MDPs, all of our generalizations, all of our
> parameter values, etc, etc.
> It includes source code for some domains that we didn't use.
> The includes source code for the leaderboards, the "phone home server"
> ruby application that handled all of the results recording, the
> proving application, etc, etc.
> BE WARNED: The code is a total mess right now. We intend to clean it
> up and document it to some degree. But, I wanted to have a snapshot
> available of EXACTLY what we were working with in case it is important
> for future reference.
> Unforunately, by moving the code around into different relative
> directories, some of the build scripts won't be putting things in the
> right place. We should work on fixing that soon.
> Some of the projects build with ant from the command line. Others are
> NetBeans projects and will require NetBeans to build. Sorry. Have
> patience.
> --
> Brian Tanner
> Ph.D Student, University of Alberta
> br...@tannerpages.com
Yeah. Are you familiar with subversion (svn)? It's commandline tool for interacting with source code repositories. If you are familiar with it, the best thing to do is to "check out" the whole project: svn checkout http://rl-competition.googlecode.com/svn/trunk/ rl- competition-read-only
(Instructions are under the "source" tab of the google code project)
I will post a downloadable link at some point, but for now with the code in such a messy state I don't want to make it easy for people to get the whole codebase unless they are savvy with the tools.
Unfortunately I don't have lots of time to talk you through the finer points right now... but if you keep asking questions via this list, we'll be happy to answer them, and then we'll have a formal, searchable record of the issues that come up.
-- Brian Tanner Ph.D Student, University of Alberta br...@tannerpages.com
> Uh, I can't find it anywhere. I don't see any files in this group and > no files in the googlecode link. I'm not terribly familiar with these > google tools, so perhaps it's something I'm missing or not doing > right?
> On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote: >> I've just dumped all of our secret private code from the secret 2008 >> repositories to the public Google Code Project:
>> I've put the distributed code (the stuff you guys downloaded) in / >> trunk/public and all of the secret stuff in /trunk/private
>> The private code includes all of the scripts we used to generate the >> proving and testing MDPs, all of our generalizations, all of our >> parameter values, etc, etc.
>> It includes source code for some domains that we didn't use.
>> The includes source code for the leaderboards, the "phone home >> server" >> ruby application that handled all of the results recording, the >> proving application, etc, etc.
>> BE WARNED: The code is a total mess right now. We intend to clean it >> up and document it to some degree. But, I wanted to have a snapshot >> available of EXACTLY what we were working with in case it is >> important >> for future reference.
>> Unforunately, by moving the code around into different relative >> directories, some of the build scripts won't be putting things in the >> right place. We should work on fixing that soon.
>> Some of the projects build with ant from the command line. Others >> are >> NetBeans projects and will require NetBeans to build. Sorry. Have >> patience. >> -- >> Brian Tanner >> Ph.D Student, University of Alberta >> br...@tannerpages.com
Excellent. Thanks. I'll have to check it out soon. If I have any
questions I'll be sure to contact you, but I think I have a fairly
good grip of the process.
On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> Yeah. Are you familiar with subversion (svn)? It's commandline tool
> for interacting with source code repositories. If you are familiar
> with it, the best thing to do is to "check out" the whole project:
> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- > competition-read-only
> (Instructions are under the "source" tab of the google code project)
> I will post a downloadable link at some point, but for now with the
> code in such a messy state I don't want to make it easy for people to
> get the whole codebase unless they are savvy with the tools.
> Unfortunately I don't have lots of time to talk you through the finer
> points right now... but if you keep asking questions via this list,
> we'll be happy to answer them, and then we'll have a formal,
> searchable record of the issues that come up.
> --
> Brian Tanner
> Ph.D Student, University of Alberta
> br...@tannerpages.com
> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
> > Uh, I can't find it anywhere. I don't see any files in this group and
> > no files in the googlecode link. I'm not terribly familiar with these
> > google tools, so perhaps it's something I'm missing or not doing
> > right?
> > On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >> I've just dumped all of our secret private code from the secret 2008
> >> repositories to the public Google Code Project:
> >> I've put the distributed code (the stuff you guys downloaded) in /
> >> trunk/public and all of the secret stuff in /trunk/private
> >> The private code includes all of the scripts we used to generate the
> >> proving and testing MDPs, all of our generalizations, all of our
> >> parameter values, etc, etc.
> >> It includes source code for some domains that we didn't use.
> >> The includes source code for the leaderboards, the "phone home
> >> server"
> >> ruby application that handled all of the results recording, the
> >> proving application, etc, etc.
> >> BE WARNED: The code is a total mess right now. We intend to clean it
> >> up and document it to some degree. But, I wanted to have a snapshot
> >> available of EXACTLY what we were working with in case it is
> >> important
> >> for future reference.
> >> Unforunately, by moving the code around into different relative
> >> directories, some of the build scripts won't be putting things in the
> >> right place. We should work on fixing that soon.
> >> Some of the projects build with ant from the command line. Others
> >> are
> >> NetBeans projects and will require NetBeans to build. Sorry. Have
> >> patience.
> >> --
> >> Brian Tanner
> >> Ph.D Student, University of Alberta
> >> br...@tannerpages.com
Had a good poke around and things are coming together. One thing I'm
wondering about is how the random seed works. The random seed can be
set via env_set_random_seed(Random_seed key), but I'm not sure how the
Random_seed_key class works. Even looking at its brief code, I can't
tell how it sets the seed. Can it take a long like the java Random
class or does it somehow combine the 2 int parameters into a long?
On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> Yeah. Are you familiar with subversion (svn)? It's commandline tool
> for interacting with source code repositories. If you are familiar
> with it, the best thing to do is to "check out" the whole project:
> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- > competition-read-only
> (Instructions are under the "source" tab of the google code project)
> I will post a downloadable link at some point, but for now with the
> code in such a messy state I don't want to make it easy for people to
> get the whole codebase unless they are savvy with the tools.
> Unfortunately I don't have lots of time to talk you through the finer
> points right now... but if you keep asking questions via this list,
> we'll be happy to answer them, and then we'll have a formal,
> searchable record of the issues that come up.
> --
> Brian Tanner
> Ph.D Student, University of Alberta
> br...@tannerpages.com
> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
> > Uh, I can't find it anywhere. I don't see any files in this group and
> > no files in the googlecode link. I'm not terribly familiar with these
> > google tools, so perhaps it's something I'm missing or not doing
> > right?
> > On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >> I've just dumped all of our secret private code from the secret 2008
> >> repositories to the public Google Code Project:
> >> I've put the distributed code (the stuff you guys downloaded) in /
> >> trunk/public and all of the secret stuff in /trunk/private
> >> The private code includes all of the scripts we used to generate the
> >> proving and testing MDPs, all of our generalizations, all of our
> >> parameter values, etc, etc.
> >> It includes source code for some domains that we didn't use.
> >> The includes source code for the leaderboards, the "phone home
> >> server"
> >> ruby application that handled all of the results recording, the
> >> proving application, etc, etc.
> >> BE WARNED: The code is a total mess right now. We intend to clean it
> >> up and document it to some degree. But, I wanted to have a snapshot
> >> available of EXACTLY what we were working with in case it is
> >> important
> >> for future reference.
> >> Unforunately, by moving the code around into different relative
> >> directories, some of the build scripts won't be putting things in the
> >> right place. We should work on fixing that soon.
> >> Some of the projects build with ant from the command line. Others
> >> are
> >> NetBeans projects and will require NetBeans to build. Sorry. Have
> >> patience.
> >> --
> >> Brian Tanner
> >> Ph.D Student, University of Alberta
> >> br...@tannerpages.com
I'm not sure if your question is in general about env_set_random_key, or about Tetris or another domain in particular.
In general, the environment designer decides how to encode, store, and otherwise handle setting and getting of random seeds. We used a generic structure so that people could do whatever made sense for them. The idea of a "seed" is supposed to be abstract here, it's "some piece of encoded information" that can be shared between the environment and the experiment program. In some cases you could make up the random seed, in others you probably want to have matching env_get_random_seed and env_set_random_seed calls (ie: you can set the random generator back to an earlier point).
Now that I've said that aloud it seems wrong, one major reason that we have the seed was supposed to be so that agents could do repeated sampling... but in the current system the agent can't call these methods. Maybe I'm just tired, but I think you may have deliberately or inadvertently pointed out a pretty big problem. Ahh well, moving along.
In Tetris, env_get_random_seed does the following:
public Random_seed_key env_get_random_seed() { if (allowSaveLoadSeed) { Random_seed_key k = new Random_seed_key(2, 0); long newSeed = gameState.getRandom().nextLong(); gameState.getRandom().setSeed(newSeed); k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed); k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed); return k; } System.err.println("env_get_random_seed() called in: " + getClass() + " but it is disabled"); return null; }
I'll do some explaining:
- Create a new structure of type Random_seed_key that will pass some encoding of the seed back to the experiment program. WSee're in Java, so that key to the random number generator is given to us as a long. We'll encode it into 2 ints, because we only have doubles and ints to choose from.
- Generate a new random seed randomly (I don't know a way to get the *current* state of the Java random number generator, so we will actually set it to something new)
- Set the random seed of the Java random number generator that Tetris is using to the new one we just created.
- Pack the new seed (long) into 2 ints.
- Return the data structure that holds the 2 ints.
So the way to think about this is you can call RL_get_random_seed from the experiment program, which returns you an object that you can use to go back to the same random generator state again later. You could do this:
(The exact function names might not be right, but you should get the idea...) (All this happens in the experiment program) currentSeed = RL_get_random_seed(); currentState = RL_get_state();
.... agent follows sequence of actions a0 a1 .... an, and along the way observes o0 r0, o1 r1, ... on, rn ....
If the agent were now to follow the same sequence of actions, it should hopefully (not thoroughly tested in Tetris) see the same observations and rewards as before.
This can be used for experiment programs to test several agents with the *exact* same starting conditions.
Please let me know if that's clear (I'm afraid it might not be) and ask more questions.
-- Brian Tanner Ph.D Student, University of Alberta br...@tannerpages.com
> Had a good poke around and things are coming together. One thing I'm > wondering about is how the random seed works. The random seed can be > set via env_set_random_seed(Random_seed key), but I'm not sure how the > Random_seed_key class works. Even looking at its brief code, I can't > tell how it sets the seed. Can it take a long like the java Random > class or does it somehow combine the 2 int parameters into a long?
> On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote: >> Yeah. Are you familiar with subversion (svn)? It's commandline tool >> for interacting with source code repositories. If you are familiar >> with it, the best thing to do is to "check out" the whole project: >> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- >> competition-read-only
>> (Instructions are under the "source" tab of the google code project)
>> I will post a downloadable link at some point, but for now with the >> code in such a messy state I don't want to make it easy for people to >> get the whole codebase unless they are savvy with the tools.
>> Unfortunately I don't have lots of time to talk you through the finer >> points right now... but if you keep asking questions via this list, >> we'll be happy to answer them, and then we'll have a formal, >> searchable record of the issues that come up.
>> -- >> Brian Tanner >> Ph.D Student, University of Alberta >> br...@tannerpages.com
>> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
>>> Uh, I can't find it anywhere. I don't see any files in this group >>> and >>> no files in the googlecode link. I'm not terribly familiar with >>> these >>> google tools, so perhaps it's something I'm missing or not doing >>> right?
>>> On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote: >>>> I've just dumped all of our secret private code from the secret >>>> 2008 >>>> repositories to the public Google Code Project:
>>>> I've put the distributed code (the stuff you guys downloaded) in / >>>> trunk/public and all of the secret stuff in /trunk/private
>>>> The private code includes all of the scripts we used to generate >>>> the >>>> proving and testing MDPs, all of our generalizations, all of our >>>> parameter values, etc, etc.
>>>> It includes source code for some domains that we didn't use.
>>>> The includes source code for the leaderboards, the "phone home >>>> server" >>>> ruby application that handled all of the results recording, the >>>> proving application, etc, etc.
>>>> BE WARNED: The code is a total mess right now. We intend to >>>> clean it >>>> up and document it to some degree. But, I wanted to have a >>>> snapshot >>>> available of EXACTLY what we were working with in case it is >>>> important >>>> for future reference.
>>>> Unforunately, by moving the code around into different relative >>>> directories, some of the build scripts won't be putting things in >>>> the >>>> right place. We should work on fixing that soon.
>>>> Some of the projects build with ant from the command line. Others >>>> are >>>> NetBeans projects and will require NetBeans to build. Sorry. Have >>>> patience. >>>> -- >>>> Brian Tanner >>>> Ph.D Student, University of Alberta >>>> br...@tannerpages.com
Ah. I see it now. I forgot to look at the code in Tetrlias and it was
right in my face. That should solve that problem.
Now, I was wondering if the environment stores the agent performance
anywhere. I've found the EpisodeLogger object, which seems to be an
outputStream (?). I'm not totally sure. Does this just store the state
and actions taken by the agent?
I'm looking for something that stores the reward at each step so I can
produce graphs of performance.
On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote:
> I'm not sure if your question is in general about env_set_random_key,
> or about Tetris or another domain in particular.
> In general, the environment designer decides how to encode, store, and
> otherwise handle setting and getting of random seeds. We used a
> generic structure so that people could do whatever made sense for
> them. The idea of a "seed" is supposed to be abstract here, it's
> "some piece of encoded information" that can be shared between the
> environment and the experiment program. In some cases you could make
> up the random seed, in others you probably want to have matching
> env_get_random_seed and env_set_random_seed calls (ie: you can set the
> random generator back to an earlier point).
> Now that I've said that aloud it seems wrong, one major reason that we
> have the seed was supposed to be so that agents could do repeated
> sampling... but in the current system the agent can't call these
> methods. Maybe I'm just tired, but I think you may have deliberately
> or inadvertently pointed out a pretty big problem. Ahh well, moving
> along.
> In Tetris, env_get_random_seed does the following:
> public Random_seed_key env_get_random_seed() {
> if (allowSaveLoadSeed) {
> Random_seed_key k = new Random_seed_key(2, 0);
> long newSeed = gameState.getRandom().nextLong();
> gameState.getRandom().setSeed(newSeed);
> k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed);
> k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
> return k;
> }
> System.err.println("env_get_random_seed() called in: " +
> getClass() + " but it is disabled");
> return null;
> }
> I'll do some explaining:
> - Create a new structure of type Random_seed_key that will pass some
> encoding of the seed back to the experiment program. WSee're in Java,
> so that key to the random number generator is given to us as a long.
> We'll encode it into 2 ints, because we only have doubles and ints to
> choose from.
> - Generate a new random seed randomly (I don't know a way to get the
> *current* state of the Java random number generator, so we will
> actually set it to something new)
> - Set the random seed of the Java random number generator that Tetris
> is using to the new one we just created.
> - Pack the new seed (long) into 2 ints.
> - Return the data structure that holds the 2 ints.
> So the way to think about this is you can call RL_get_random_seed from
> the experiment program, which returns you an object that you can use
> to go back to the same random generator state again later. You could
> do this:
> (The exact function names might not be right, but you should get the
> idea...)
> (All this happens in the experiment program)
> currentSeed = RL_get_random_seed();
> currentState = RL_get_state();
> .... agent follows sequence of actions a0 a1 .... an, and along the
> way observes o0 r0, o1 r1, ... on, rn ....
> If the agent were now to follow the same sequence of actions, it
> should hopefully (not thoroughly tested in Tetris) see the same
> observations and rewards as before.
> This can be used for experiment programs to test several agents with
> the *exact* same starting conditions.
> Please let me know if that's clear (I'm afraid it might not be) and
> ask more questions.
> --
> Brian Tanner
> Ph.D Student, University of Alberta
> br...@tannerpages.com
> On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:
> > Had a good poke around and things are coming together. One thing I'm
> > wondering about is how the random seed works. The random seed can be
> > set via env_set_random_seed(Random_seed key), but I'm not sure how the
> > Random_seed_key class works. Even looking at its brief code, I can't
> > tell how it sets the seed. Can it take a long like the java Random
> > class or does it somehow combine the 2 int parameters into a long?
> > On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >> Yeah. Are you familiar with subversion (svn)? It's commandline tool
> >> for interacting with source code repositories. If you are familiar
> >> with it, the best thing to do is to "check out" the whole project:
> >> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- > >> competition-read-only
> >> (Instructions are under the "source" tab of the google code project)
> >> I will post a downloadable link at some point, but for now with the
> >> code in such a messy state I don't want to make it easy for people to
> >> get the whole codebase unless they are savvy with the tools.
> >> Unfortunately I don't have lots of time to talk you through the finer
> >> points right now... but if you keep asking questions via this list,
> >> we'll be happy to answer them, and then we'll have a formal,
> >> searchable record of the issues that come up.
> >> --
> >> Brian Tanner
> >> Ph.D Student, University of Alberta
> >> br...@tannerpages.com
> >> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
> >>> Uh, I can't find it anywhere. I don't see any files in this group
> >>> and
> >>> no files in the googlecode link. I'm not terribly familiar with
> >>> these
> >>> google tools, so perhaps it's something I'm missing or not doing
> >>> right?
> >>> On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >>>> I've just dumped all of our secret private code from the secret
> >>>> 2008
> >>>> repositories to the public Google Code Project:
> >>>> I've put the distributed code (the stuff you guys downloaded) in /
> >>>> trunk/public and all of the secret stuff in /trunk/private
> >>>> The private code includes all of the scripts we used to generate
> >>>> the
> >>>> proving and testing MDPs, all of our generalizations, all of our
> >>>> parameter values, etc, etc.
> >>>> It includes source code for some domains that we didn't use.
> >>>> The includes source code for the leaderboards, the "phone home
> >>>> server"
> >>>> ruby application that handled all of the results recording, the
> >>>> proving application, etc, etc.
> >>>> BE WARNED: The code is a total mess right now. We intend to
> >>>> clean it
> >>>> up and document it to some degree. But, I wanted to have a
> >>>> snapshot
> >>>> available of EXACTLY what we were working with in case it is
> >>>> important
> >>>> for future reference.
> >>>> Unforunately, by moving the code around into different relative
> >>>> directories, some of the build scripts won't be putting things in
> >>>> the
> >>>> right place. We should work on fixing that soon.
> >>>> Some of the projects build with ant from the command line. Others
> >>>> are
> >>>> NetBeans projects and will require NetBeans to build. Sorry. Have
> >>>> patience.
> >>>> --
> >>>> Brian Tanner
> >>>> Ph.D Student, University of Alberta
> >>>> br...@tannerpages.com
Hi again. The EpisodeLogger is some fanciness that we added for the competition so we could playback episodes. It's probably not what you're looking for.
The way RL-Glue works, what you really want is inside the experiment program.
If you look at the console trainers (experiment programs) in the public part of the competition directory, you should see what I mean. They allow you to either run one step at a time and accumulate the reward yourself, or you can use the RL_episode methods to run whole episodes and then get the rewards at the end.
I think the javaConsoleTrainer should help make this clear. But if not, keep asking!
On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:
> Ah. I see it now. I forgot to look at the code in Tetrlias and it was > right in my face. That should solve that problem.
> Now, I was wondering if the environment stores the agent performance > anywhere. I've found the EpisodeLogger object, which seems to be an > outputStream (?). I'm not totally sure. Does this just store the state > and actions taken by the agent?
> I'm looking for something that stores the reward at each step so I can > produce graphs of performance.
> On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote: >> I'm not sure if your question is in general about env_set_random_key, >> or about Tetris or another domain in particular.
>> In general, the environment designer decides how to encode, store, and >> otherwise handle setting and getting of random seeds. We used a >> generic structure so that people could do whatever made sense for >> them. The idea of a "seed" is supposed to be abstract here, it's >> "some piece of encoded information" that can be shared between the >> environment and the experiment program. In some cases you could make >> up the random seed, in others you probably want to have matching >> env_get_random_seed and env_set_random_seed calls (ie: you can set the >> random generator back to an earlier point).
>> Now that I've said that aloud it seems wrong, one major reason that we >> have the seed was supposed to be so that agents could do repeated >> sampling... but in the current system the agent can't call these >> methods. Maybe I'm just tired, but I think you may have deliberately >> or inadvertently pointed out a pretty big problem. Ahh well, moving >> along.
>> In Tetris, env_get_random_seed does the following:
>> public Random_seed_key env_get_random_seed() { >> if (allowSaveLoadSeed) { >> Random_seed_key k = new Random_seed_key(2, 0); >> long newSeed = gameState.getRandom().nextLong(); >> gameState.getRandom().setSeed(newSeed); >> k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed); >> k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed); >> return k; >> } >> System.err.println("env_get_random_seed() called in: " + >> getClass() + " but it is disabled"); >> return null; >> }
>> I'll do some explaining:
>> - Create a new structure of type Random_seed_key that will pass some >> encoding of the seed back to the experiment program. WSee're in Java, >> so that key to the random number generator is given to us as a long. >> We'll encode it into 2 ints, because we only have doubles and ints to >> choose from.
>> - Generate a new random seed randomly (I don't know a way to get the >> *current* state of the Java random number generator, so we will >> actually set it to something new)
>> - Set the random seed of the Java random number generator that Tetris >> is using to the new one we just created.
>> - Pack the new seed (long) into 2 ints.
>> - Return the data structure that holds the 2 ints.
>> So the way to think about this is you can call RL_get_random_seed from >> the experiment program, which returns you an object that you can use >> to go back to the same random generator state again later. You could >> do this:
>> (The exact function names might not be right, but you should get the >> idea...) >> (All this happens in the experiment program) >> currentSeed = RL_get_random_seed(); >> currentState = RL_get_state();
>> .... agent follows sequence of actions a0 a1 .... an, and along the >> way observes o0 r0, o1 r1, ... on, rn ....
>> If the agent were now to follow the same sequence of actions, it >> should hopefully (not thoroughly tested in Tetris) see the same >> observations and rewards as before.
>> This can be used for experiment programs to test several agents with >> the *exact* same starting conditions.
>> Please let me know if that's clear (I'm afraid it might not be) and >> ask more questions.
>> -- >> Brian Tanner >> Ph.D Student, University of Alberta >> br...@tannerpages.com
>> On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:
>> > Had a good poke around and things are coming together. One thing I'm >> > wondering about is how the random seed works. The random seed can be >> > set via env_set_random_seed(Random_seed key), but I'm not sure how the >> > Random_seed_key class works. Even looking at its brief code, I can't >> > tell how it sets the seed. Can it take a long like the java Random >> > class or does it somehow combine the 2 int parameters into a long?
>> > On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote: >> >> Yeah. Are you familiar with subversion (svn)? It's commandline tool >> >> for interacting with source code repositories. If you are familiar >> >> with it, the best thing to do is to "check out" the whole project: >> >> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- >> >> competition-read-only
>> >> (Instructions are under the "source" tab of the google code project)
>> >> I will post a downloadable link at some point, but for now with the >> >> code in such a messy state I don't want to make it easy for people to >> >> get the whole codebase unless they are savvy with the tools.
>> >> Unfortunately I don't have lots of time to talk you through the finer >> >> points right now... but if you keep asking questions via this list, >> >> we'll be happy to answer them, and then we'll have a formal, >> >> searchable record of the issues that come up.
>> >> -- >> >> Brian Tanner >> >> Ph.D Student, University of Alberta >> >> br...@tannerpages.com
>> >> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
>> >>> Uh, I can't find it anywhere. I don't see any files in this group >> >>> and >> >>> no files in the googlecode link. I'm not terribly familiar with >> >>> these >> >>> google tools, so perhaps it's something I'm missing or not doing >> >>> right?
>> >>> On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote: >> >>>> I've just dumped all of our secret private code from the secret >> >>>> 2008 >> >>>> repositories to the public Google Code Project:
>> >>>> I've put the distributed code (the stuff you guys downloaded) in / >> >>>> trunk/public and all of the secret stuff in /trunk/private
>> >>>> The private code includes all of the scripts we used to generate >> >>>> the >> >>>> proving and testing MDPs, all of our generalizations, all of our >> >>>> parameter values, etc, etc.
>> >>>> It includes source code for some domains that we didn't use.
>> >>>> The includes source code for the leaderboards, the "phone home >> >>>> server" >> >>>> ruby application that handled all of the results recording, the >> >>>> proving application, etc, etc.
>> >>>> BE WARNED: The code is a total mess right now. We intend to >> >>>> clean it >> >>>> up and document it to some degree. But, I wanted to have a >> >>>> snapshot >> >>>> available of EXACTLY what we were working with in case it is >> >>>> important >> >>>> for future reference.
>> >>>> Unforunately, by moving the code around into different relative >> >>>> directories, some of the build scripts won't be putting things in >> >>>> the >> >>>> right place. We should work on fixing that soon.
>> >>>> Some of the projects build with ant from the command line. Others >> >>>> are >> >>>> NetBeans projects and will require NetBeans to build. Sorry. Have >> >>>> patience. >> >>>> -- >> >>>> Brian Tanner >> >>>> Ph.D Student, University of Alberta >> >>>> br...@tannerpages.com
-- Brian Tanner Ph.D Student University of Alberta br...@tannerpages.com
This is unrelated to the code and more to the competition: Did the
time that the agent took to calculate the next action matter in the
competition? My agent that I submitted was fast (and fairly effective)
but before the competition had ended I had created another agent that
would theoretically do better (uses 1-step look ahead), but was about
119x slower.
It's too late now, but would this slow agent (takes about 0.2 second
to compute a goal piece location) have been accepted without penalty?
If so, I might have got a better placing. Maybe it's better I don't
know, otherwise I'll regret not proving it beforehand.
I'm using this thread because I don't expect the RL-Competition forum
to be as regularly watched.
- Sam
On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote:
> Hi again. The EpisodeLogger is some fanciness that we added for the
> competition so we could playback episodes. It's probably not what
> you're looking for.
> The way RL-Glue works, what you really want is inside the experiment program.
> If you look at the console trainers (experiment programs) in the
> public part of the competition directory, you should see what I mean.
> They allow you to either run one step at a time and accumulate the
> reward yourself, or you can use the RL_episode methods to run whole
> episodes and then get the rewards at the end.
> I think the javaConsoleTrainer should help make this clear. But if
> not, keep asking!
> On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:
> > Ah. I see it now. I forgot to look at the code in Tetrlias and it was
> > right in my face. That should solve that problem.
> > Now, I was wondering if the environment stores the agent performance
> > anywhere. I've found the EpisodeLogger object, which seems to be an
> > outputStream (?). I'm not totally sure. Does this just store the state
> > and actions taken by the agent?
> > I'm looking for something that stores the reward at each step so I can
> > produce graphs of performance.
> > On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >> I'm not sure if your question is in general about env_set_random_key,
> >> or about Tetris or another domain in particular.
> >> In general, the environment designer decides how to encode, store, and
> >> otherwise handle setting and getting of random seeds. We used a
> >> generic structure so that people could do whatever made sense for
> >> them. The idea of a "seed" is supposed to be abstract here, it's
> >> "some piece of encoded information" that can be shared between the
> >> environment and the experiment program. In some cases you could make
> >> up the random seed, in others you probably want to have matching
> >> env_get_random_seed and env_set_random_seed calls (ie: you can set the
> >> random generator back to an earlier point).
> >> Now that I've said that aloud it seems wrong, one major reason that we
> >> have the seed was supposed to be so that agents could do repeated
> >> sampling... but in the current system the agent can't call these
> >> methods. Maybe I'm just tired, but I think you may have deliberately
> >> or inadvertently pointed out a pretty big problem. Ahh well, moving
> >> along.
> >> In Tetris, env_get_random_seed does the following:
> >> public Random_seed_key env_get_random_seed() {
> >> if (allowSaveLoadSeed) {
> >> Random_seed_key k = new Random_seed_key(2, 0);
> >> long newSeed = gameState.getRandom().nextLong();
> >> gameState.getRandom().setSeed(newSeed);
> >> k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed);
> >> k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
> >> return k;
> >> }
> >> System.err.println("env_get_random_seed() called in: " +
> >> getClass() + " but it is disabled");
> >> return null;
> >> }
> >> I'll do some explaining:
> >> - Create a new structure of type Random_seed_key that will pass some
> >> encoding of the seed back to the experiment program. WSee're in Java,
> >> so that key to the random number generator is given to us as a long.
> >> We'll encode it into 2 ints, because we only have doubles and ints to
> >> choose from.
> >> - Generate a new random seed randomly (I don't know a way to get the
> >> *current* state of the Java random number generator, so we will
> >> actually set it to something new)
> >> - Set the random seed of the Java random number generator that Tetris
> >> is using to the new one we just created.
> >> - Pack the new seed (long) into 2 ints.
> >> - Return the data structure that holds the 2 ints.
> >> So the way to think about this is you can call RL_get_random_seed from
> >> the experiment program, which returns you an object that you can use
> >> to go back to the same random generator state again later. You could
> >> do this:
> >> (The exact function names might not be right, but you should get the
> >> idea...)
> >> (All this happens in the experiment program)
> >> currentSeed = RL_get_random_seed();
> >> currentState = RL_get_state();
> >> .... agent follows sequence of actions a0 a1 .... an, and along the
> >> way observes o0 r0, o1 r1, ... on, rn ....
> >> If the agent were now to follow the same sequence of actions, it
> >> should hopefully (not thoroughly tested in Tetris) see the same
> >> observations and rewards as before.
> >> This can be used for experiment programs to test several agents with
> >> the *exact* same starting conditions.
> >> Please let me know if that's clear (I'm afraid it might not be) and
> >> ask more questions.
> >> --
> >> Brian Tanner
> >> Ph.D Student, University of Alberta
> >> br...@tannerpages.com
> >> On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:
> >> > Had a good poke around and things are coming together. One thing I'm
> >> > wondering about is how the random seed works. The random seed can be
> >> > set via env_set_random_seed(Random_seed key), but I'm not sure how the
> >> > Random_seed_key class works. Even looking at its brief code, I can't
> >> > tell how it sets the seed. Can it take a long like the java Random
> >> > class or does it somehow combine the 2 int parameters into a long?
> >> > On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >> >> Yeah. Are you familiar with subversion (svn)? It's commandline tool
> >> >> for interacting with source code repositories. If you are familiar
> >> >> with it, the best thing to do is to "check out" the whole project:
> >> >> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- > >> >> competition-read-only
> >> >> (Instructions are under the "source" tab of the google code project)
> >> >> I will post a downloadable link at some point, but for now with the
> >> >> code in such a messy state I don't want to make it easy for people to
> >> >> get the whole codebase unless they are savvy with the tools.
> >> >> Unfortunately I don't have lots of time to talk you through the finer
> >> >> points right now... but if you keep asking questions via this list,
> >> >> we'll be happy to answer them, and then we'll have a formal,
> >> >> searchable record of the issues that come up.
> >> >> --
> >> >> Brian Tanner
> >> >> Ph.D Student, University of Alberta
> >> >> br...@tannerpages.com
> >> >> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
> >> >>> Uh, I can't find it anywhere. I don't see any files in this group
> >> >>> and
> >> >>> no files in the googlecode link. I'm not terribly familiar with
> >> >>> these
> >> >>> google tools, so perhaps it's something I'm missing or not doing
> >> >>> right?
> >> >>> On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >> >>>> I've just dumped all of our secret private code from the secret
> >> >>>> 2008
> >> >>>> repositories to the public Google Code Project:
> >> >>>> I've put the distributed code (the stuff you guys downloaded) in /
> >> >>>> trunk/public and all of the secret stuff in /trunk/private
> >> >>>> The private code includes all of the scripts we used to generate
> >> >>>> the
> >> >>>> proving and testing MDPs, all of our generalizations, all of our
> >> >>>> parameter values, etc, etc.
> >> >>>> It includes source code for some domains that we didn't use.
> >> >>>> The includes source code for the leaderboards, the "phone home
> >> >>>> server"
> >> >>>> ruby application that handled all of the results recording, the
> >> >>>> proving application, etc, etc.
We had no computational or time restrictions on agents this year. So, you could use a supercomputer or a laptop, and you could write a super fast agent, or a slow one.
The only restriction would be that you could actually complete a proving or testing run in a reasonable amount of time. For some events, the number of steps was very high, and an approach that takes a long time (as you describe) could take weeks or months to finish an official run.
Try it out. The proving server is still online ;)
-- Brian Tanner Ph.D Student, University of Alberta br...@tannerpages.com
> This is unrelated to the code and more to the competition: Did the > time that the agent took to calculate the next action matter in the > competition? My agent that I submitted was fast (and fairly effective) > but before the competition had ended I had created another agent that > would theoretically do better (uses 1-step look ahead), but was about > 119x slower.
> It's too late now, but would this slow agent (takes about 0.2 second > to compute a goal piece location) have been accepted without penalty? > If so, I might have got a better placing. Maybe it's better I don't > know, otherwise I'll regret not proving it beforehand.
> I'm using this thread because I don't expect the RL-Competition forum > to be as regularly watched.
> - Sam
> On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote: >> Hi again. The EpisodeLogger is some fanciness that we added for the >> competition so we could playback episodes. It's probably not what >> you're looking for.
>> The way RL-Glue works, what you really want is inside the >> experiment program.
>> If you look at the console trainers (experiment programs) in the >> public part of the competition directory, you should see what I mean. >> They allow you to either run one step at a time and accumulate the >> reward yourself, or you can use the RL_episode methods to run whole >> episodes and then get the rewards at the end.
>> I think the javaConsoleTrainer should help make this clear. But if >> not, keep asking!
>> On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:
>>> Ah. I see it now. I forgot to look at the code in Tetrlias and it >>> was >>> right in my face. That should solve that problem.
>>> Now, I was wondering if the environment stores the agent performance >>> anywhere. I've found the EpisodeLogger object, which seems to be an >>> outputStream (?). I'm not totally sure. Does this just store the >>> state >>> and actions taken by the agent?
>>> I'm looking for something that stores the reward at each step so I >>> can >>> produce graphs of performance.
>>> On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote: >>>> I'm not sure if your question is in general about >>>> env_set_random_key, >>>> or about Tetris or another domain in particular.
>>>> In general, the environment designer decides how to encode, >>>> store, and >>>> otherwise handle setting and getting of random seeds. We used a >>>> generic structure so that people could do whatever made sense for >>>> them. The idea of a "seed" is supposed to be abstract here, it's >>>> "some piece of encoded information" that can be shared between the >>>> environment and the experiment program. In some cases you could >>>> make >>>> up the random seed, in others you probably want to have matching >>>> env_get_random_seed and env_set_random_seed calls (ie: you can >>>> set the >>>> random generator back to an earlier point).
>>>> Now that I've said that aloud it seems wrong, one major reason >>>> that we >>>> have the seed was supposed to be so that agents could do repeated >>>> sampling... but in the current system the agent can't call these >>>> methods. Maybe I'm just tired, but I think you may have >>>> deliberately >>>> or inadvertently pointed out a pretty big problem. Ahh well, >>>> moving >>>> along.
>>>> In Tetris, env_get_random_seed does the following:
>>>> public Random_seed_key env_get_random_seed() { >>>> if (allowSaveLoadSeed) { >>>> Random_seed_key k = new Random_seed_key(2, 0); >>>> long newSeed = gameState.getRandom().nextLong(); >>>> gameState.getRandom().setSeed(newSeed); >>>> k.intArray[0] = >>>> UtilityShop.LongHighBitsToInt(newSeed); >>>> k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed); >>>> return k; >>>> } >>>> System.err.println("env_get_random_seed() called in: " + >>>> getClass() + " but it is disabled"); >>>> return null; >>>> }
>>>> I'll do some explaining:
>>>> - Create a new structure of type Random_seed_key that will pass >>>> some >>>> encoding of the seed back to the experiment program. WSee're in >>>> Java, >>>> so that key to the random number generator is given to us as a >>>> long. >>>> We'll encode it into 2 ints, because we only have doubles and >>>> ints to >>>> choose from.
>>>> - Generate a new random seed randomly (I don't know a way to get >>>> the >>>> *current* state of the Java random number generator, so we will >>>> actually set it to something new)
>>>> - Set the random seed of the Java random number generator that >>>> Tetris >>>> is using to the new one we just created.
>>>> - Pack the new seed (long) into 2 ints.
>>>> - Return the data structure that holds the 2 ints.
>>>> So the way to think about this is you can call RL_get_random_seed >>>> from >>>> the experiment program, which returns you an object that you can >>>> use >>>> to go back to the same random generator state again later. You >>>> could >>>> do this:
>>>> (The exact function names might not be right, but you should get >>>> the >>>> idea...) >>>> (All this happens in the experiment program) >>>> currentSeed = RL_get_random_seed(); >>>> currentState = RL_get_state();
>>>> .... agent follows sequence of actions a0 a1 .... an, and along the >>>> way observes o0 r0, o1 r1, ... on, rn ....
>>>> If the agent were now to follow the same sequence of actions, it >>>> should hopefully (not thoroughly tested in Tetris) see the same >>>> observations and rewards as before.
>>>> This can be used for experiment programs to test several agents >>>> with >>>> the *exact* same starting conditions.
>>>> Please let me know if that's clear (I'm afraid it might not be) and >>>> ask more questions.
>>>> -- >>>> Brian Tanner >>>> Ph.D Student, University of Alberta >>>> br...@tannerpages.com
>>>> On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:
>>>>> Had a good poke around and things are coming together. One thing >>>>> I'm >>>>> wondering about is how the random seed works. The random seed >>>>> can be >>>>> set via env_set_random_seed(Random_seed key), but I'm not sure >>>>> how the >>>>> Random_seed_key class works. Even looking at its brief code, I >>>>> can't >>>>> tell how it sets the seed. Can it take a long like the java Random >>>>> class or does it somehow combine the 2 int parameters into a long?
>>>>> On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote: >>>>>> Yeah. Are you familiar with subversion (svn)? It's commandline >>>>>> tool >>>>>> for interacting with source code repositories. If you are >>>>>> familiar >>>>>> with it, the best thing to do is to "check out" the whole >>>>>> project: >>>>>> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- >>>>>> competition-read-only
>>>>>> (Instructions are under the "source" tab of the google code >>>>>> project)
>>>>>> I will post a downloadable link at some point, but for now with >>>>>> the >>>>>> code in such a messy state I don't want to make it easy for >>>>>> people to >>>>>> get the whole codebase unless they are savvy with the tools.
>>>>>> Unfortunately I don't have lots of time to talk you through the >>>>>> finer >>>>>> points right now... but if you keep asking questions via this >>>>>> list, >>>>>> we'll be happy to answer them, and then we'll have a formal, >>>>>> searchable record of the issues that come up.
>>>>>> -- >>>>>> Brian Tanner >>>>>> Ph.D Student, University of Alberta >>>>>> br...@tannerpages.com
>>>>>> On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:
>>>>>>> Uh, I can't find it anywhere. I don't see any files in this >>>>>>> group >>>>>>> and >>>>>>> no files in the googlecode link. I'm not terribly familiar with >>>>>>> these >>>>>>> google tools, so perhaps it's something I'm missing or not doing >>>>>>> right?
>>>>>>> On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote: >>>>>>>> I've just dumped all of our secret private code from the secret >>>>>>>> 2008 >>>>>>>> repositories to the public Google Code Project:
Sadly, my side of proving server access is no longer available so I
won't be able to try it without contacting the Tech Support guys
again. From my own experiments, it does seem to be better, but only
slightly. Although, this is only coming from a single run consisting
of about half a million steps. I'll be able to derive better
statistical results in the end anyway.
On Aug 2, 7:44 am, Brian Tanner <br...@tannerpages.com> wrote:
> We had no computational or time restrictions on agents this year. So,
> you could use a supercomputer or a laptop, and you could write a super
> fast agent, or a slow one.
> The only restriction would be that you could actually complete a
> proving or testing run in a reasonable amount of time. For some
> events, the number of steps was very high, and an approach that takes
> a long time (as you describe) could take weeks or months to finish an
> official run.
> Try it out. The proving server is still online ;)
> --
> Brian Tanner
> Ph.D Student, University of Alberta
> br...@tannerpages.com
> On 31-Jul-08, at 10:52 PM, Sam Sarjant wrote:
> > This is unrelated to the code and more to the competition: Did the
> > time that the agent took to calculate the next action matter in the
> > competition? My agent that I submitted was fast (and fairly effective)
> > but before the competition had ended I had created another agent that
> > would theoretically do better (uses 1-step look ahead), but was about
> > 119x slower.
> > It's too late now, but would this slow agent (takes about 0.2 second
> > to compute a goal piece location) have been accepted without penalty?
> > If so, I might have got a better placing. Maybe it's better I don't
> > know, otherwise I'll regret not proving it beforehand.
> > I'm using this thread because I don't expect the RL-Competition forum
> > to be as regularly watched.
> > - Sam
> > On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote:
> >> Hi again. The EpisodeLogger is some fanciness that we added for the
> >> competition so we could playback episodes. It's probably not what
> >> you're looking for.
> >> The way RL-Glue works, what you really want is inside the
> >> experiment program.
> >> If you look at the console trainers (experiment programs) in the
> >> public part of the competition directory, you should see what I mean.
> >> They allow you to either run one step at a time and accumulate the
> >> reward yourself, or you can use the RL_episode methods to run whole
> >> episodes and then get the rewards at the end.
> >> I think the javaConsoleTrainer should help make this clear. But if
> >> not, keep asking!
> >> On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:
> >>> Ah. I see it now. I forgot to look at the code in Tetrlias and it
> >>> was
> >>> right in my face. That should solve that problem.
> >>> Now, I was wondering if the environment stores the agent performance
> >>> anywhere. I've found the EpisodeLogger object, which seems to be an
> >>> outputStream (?). I'm not totally sure. Does this just store the
> >>> state
> >>> and actions taken by the agent?
> >>> I'm looking for something that stores the reward at each step so I
> >>> can
> >>> produce graphs of performance.
> >>> On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >>>> I'm not sure if your question is in general about
> >>>> env_set_random_key,
> >>>> or about Tetris or another domain in particular.
> >>>> In general, the environment designer decides how to encode,
> >>>> store, and
> >>>> otherwise handle setting and getting of random seeds. We used a
> >>>> generic structure so that people could do whatever made sense for
> >>>> them. The idea of a "seed" is supposed to be abstract here, it's
> >>>> "some piece of encoded information" that can be shared between the
> >>>> environment and the experiment program. In some cases you could
> >>>> make
> >>>> up the random seed, in others you probably want to have matching
> >>>> env_get_random_seed and env_set_random_seed calls (ie: you can
> >>>> set the
> >>>> random generator back to an earlier point).
> >>>> Now that I've said that aloud it seems wrong, one major reason
> >>>> that we
> >>>> have the seed was supposed to be so that agents could do repeated
> >>>> sampling... but in the current system the agent can't call these
> >>>> methods. Maybe I'm just tired, but I think you may have
> >>>> deliberately
> >>>> or inadvertently pointed out a pretty big problem. Ahh well,
> >>>> moving
> >>>> along.
> >>>> In Tetris, env_get_random_seed does the following:
> >>>> public Random_seed_key env_get_random_seed() {
> >>>> if (allowSaveLoadSeed) {
> >>>> Random_seed_key k = new Random_seed_key(2, 0);
> >>>> long newSeed = gameState.getRandom().nextLong();
> >>>> gameState.getRandom().setSeed(newSeed);
> >>>> k.intArray[0] =
> >>>> UtilityShop.LongHighBitsToInt(newSeed);
> >>>> k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
> >>>> return k;
> >>>> }
> >>>> System.err.println("env_get_random_seed() called in: " +
> >>>> getClass() + " but it is disabled");
> >>>> return null;
> >>>> }
> >>>> I'll do some explaining:
> >>>> - Create a new structure of type Random_seed_key that will pass
> >>>> some
> >>>> encoding of the seed back to the experiment program. WSee're in
> >>>> Java,
> >>>> so that key to the random number generator is given to us as a
> >>>> long.
> >>>> We'll encode it into 2 ints, because we only have doubles and
> >>>> ints to
> >>>> choose from.
> >>>> - Generate a new random seed randomly (I don't know a way to get
> >>>> the
> >>>> *current* state of the Java random number generator, so we will
> >>>> actually set it to something new)
> >>>> - Set the random seed of the Java random number generator that
> >>>> Tetris
> >>>> is using to the new one we just created.
> >>>> - Pack the new seed (long) into 2 ints.
> >>>> - Return the data structure that holds the 2 ints.
> >>>> So the way to think about this is you can call RL_get_random_seed
> >>>> from
> >>>> the experiment program, which returns you an object that you can
> >>>> use
> >>>> to go back to the same random generator state again later. You
> >>>> could
> >>>> do this:
> >>>> (The exact function names might not be right, but you should get
> >>>> the
> >>>> idea...)
> >>>> (All this happens in the experiment program)
> >>>> currentSeed = RL_get_random_seed();
> >>>> currentState = RL_get_state();
> >>>> .... agent follows sequence of actions a0 a1 .... an, and along the
> >>>> way observes o0 r0, o1 r1, ... on, rn ....
> >>>> If the agent were now to follow the same sequence of actions, it
> >>>> should hopefully (not thoroughly tested in Tetris) see the same
> >>>> observations and rewards as before.
> >>>> This can be used for experiment programs to test several agents
> >>>> with
> >>>> the *exact* same starting conditions.
> >>>> Please let me know if that's clear (I'm afraid it might not be) and
> >>>> ask more questions.
> >>>> --
> >>>> Brian Tanner
> >>>> Ph.D Student, University of Alberta
> >>>> br...@tannerpages.com
> >>>> On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:
> >>>>> Had a good poke around and things are coming together. One thing
> >>>>> I'm
> >>>>> wondering about is how the random seed works. The random seed
> >>>>> can be
> >>>>> set via env_set_random_seed(Random_seed key), but I'm not sure
> >>>>> how the
> >>>>> Random_seed_key class works. Even looking at its brief code, I
> >>>>> can't
> >>>>> tell how it sets the seed. Can it take a long like the java Random
> >>>>> class or does it somehow combine the 2 int parameters into a long?
> >>>>> On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> >>>>>> Yeah. Are you familiar with subversion (svn)? It's commandline
> >>>>>> tool
> >>>>>> for interacting with source code repositories. If you are
> >>>>>> familiar
> >>>>>> with it, the best thing to do is to "check out" the whole
> >>>>>> project:
> >>>>>> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- > >>>>>> competition-read-only
> >>>>>> (Instructions are under the "source" tab of the google code
> >>>>>> project)
> >>>>>> I will post a downloadable link at some point, but for now with
> >>>>>> the
> >>>>>> code in such a messy state I don't want to make it easy for
> >>>>>> people to
> >>>>>> get the whole codebase unless they are savvy with the tools.
Hi again. I've gained full control of the code and things are going
swimmingly. The following question concerns the Tetris domain.
I have a question about how the proving and testing was done. The
details on the website state that it was done over 10 MDPs (I assume
taken randomly from the 15 you stated previously) and done with 5
million steps per MDP.
I assume the testing run was done in much the same manner.
However, there is some overall ambiguity towards the scores and output
graph. Are the leaderboard scores an average of the 10 MDPs or a
total? And is the final performance graph an average (over 5 million
steps) or a continual concatenation of the MDP scores (x axis = 50
million steps)? It looks to be made up of 10 'chunks' and would
account for the fluctuations of the lines.
I'm just curious about this because I wish to emulate the results
myself.
On Aug 2, 1:47 pm, Sam Sarjant <effer...@gmail.com> wrote:
> Sadly, my side of proving server access is no longer available so I
> won't be able to try it without contacting the Tech Support guys
> again. From my own experiments, it does seem to be better, but only
> slightly. Although, this is only coming from a single run consisting
> of about half a million steps. I'll be able to derive better
> statistical results in the end anyway.
> On Aug 2, 7:44 am, Brian Tanner <br...@tannerpages.com> wrote:
> > We had no computational or time restrictions on agents this year. So,
> > you could use a supercomputer or a laptop, and you could write a super
> > fast agent, or a slow one.
> > The only restriction would be that you could actually complete a
> > proving or testing run in a reasonable amount of time. For some
> > events, the number of steps was very high, and an approach that takes
> > a long time (as you describe) could take weeks or months to finish an
> > official run.
> > Try it out. The proving server is still online ;)
> > --
> > Brian Tanner
> > Ph.D Student, University of Alberta
> > br...@tannerpages.com
> > On 31-Jul-08, at 10:52 PM, Sam Sarjant wrote:
> > > This is unrelated to the code and more to the competition: Did the
> > > time that the agent took to calculate the next action matter in the
> > > competition? My agent that I submitted was fast (and fairly effective)
> > > but before the competition had ended I had created another agent that
> > > would theoretically do better (uses 1-step look ahead), but was about
> > > 119x slower.
> > > It's too late now, but would this slow agent (takes about 0.2 second
> > > to compute a goal piece location) have been accepted without penalty?
> > > If so, I might have got a better placing. Maybe it's better I don't
> > > know, otherwise I'll regret not proving it beforehand.
> > > I'm using this thread because I don't expect the RL-Competition forum
> > > to be as regularly watched.
> > > - Sam
> > > On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote:
> > >> Hi again. The EpisodeLogger is some fanciness that we added for the
> > >> competition so we could playback episodes. It's probably not what
> > >> you're looking for.
> > >> The way RL-Glue works, what you really want is inside the
> > >> experiment program.
> > >> If you look at the console trainers (experiment programs) in the
> > >> public part of the competition directory, you should see what I mean.
> > >> They allow you to either run one step at a time and accumulate the
> > >> reward yourself, or you can use the RL_episode methods to run whole
> > >> episodes and then get the rewards at the end.
> > >> I think the javaConsoleTrainer should help make this clear. But if
> > >> not, keep asking!
> > >> On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:
> > >>> Ah. I see it now. I forgot to look at the code in Tetrlias and it
> > >>> was
> > >>> right in my face. That should solve that problem.
> > >>> Now, I was wondering if the environment stores the agent performance
> > >>> anywhere. I've found the EpisodeLogger object, which seems to be an
> > >>> outputStream (?). I'm not totally sure. Does this just store the
> > >>> state
> > >>> and actions taken by the agent?
> > >>> I'm looking for something that stores the reward at each step so I
> > >>> can
> > >>> produce graphs of performance.
> > >>> On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote:
> > >>>> I'm not sure if your question is in general about
> > >>>> env_set_random_key,
> > >>>> or about Tetris or another domain in particular.
> > >>>> In general, the environment designer decides how to encode,
> > >>>> store, and
> > >>>> otherwise handle setting and getting of random seeds. We used a
> > >>>> generic structure so that people could do whatever made sense for
> > >>>> them. The idea of a "seed" is supposed to be abstract here, it's
> > >>>> "some piece of encoded information" that can be shared between the
> > >>>> environment and the experiment program. In some cases you could
> > >>>> make
> > >>>> up the random seed, in others you probably want to have matching
> > >>>> env_get_random_seed and env_set_random_seed calls (ie: you can
> > >>>> set the
> > >>>> random generator back to an earlier point).
> > >>>> Now that I've said that aloud it seems wrong, one major reason
> > >>>> that we
> > >>>> have the seed was supposed to be so that agents could do repeated
> > >>>> sampling... but in the current system the agent can't call these
> > >>>> methods. Maybe I'm just tired, but I think you may have
> > >>>> deliberately
> > >>>> or inadvertently pointed out a pretty big problem. Ahh well,
> > >>>> moving
> > >>>> along.
> > >>>> In Tetris, env_get_random_seed does the following:
> > >>>> public Random_seed_key env_get_random_seed() {
> > >>>> if (allowSaveLoadSeed) {
> > >>>> Random_seed_key k = new Random_seed_key(2, 0);
> > >>>> long newSeed = gameState.getRandom().nextLong();
> > >>>> gameState.getRandom().setSeed(newSeed);
> > >>>> k.intArray[0] =
> > >>>> UtilityShop.LongHighBitsToInt(newSeed);
> > >>>> k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
> > >>>> return k;
> > >>>> }
> > >>>> System.err.println("env_get_random_seed() called in: " +
> > >>>> getClass() + " but it is disabled");
> > >>>> return null;
> > >>>> }
> > >>>> I'll do some explaining:
> > >>>> - Create a new structure of type Random_seed_key that will pass
> > >>>> some
> > >>>> encoding of the seed back to the experiment program. WSee're in
> > >>>> Java,
> > >>>> so that key to the random number generator is given to us as a
> > >>>> long.
> > >>>> We'll encode it into 2 ints, because we only have doubles and
> > >>>> ints to
> > >>>> choose from.
> > >>>> - Generate a new random seed randomly (I don't know a way to get
> > >>>> the
> > >>>> *current* state of the Java random number generator, so we will
> > >>>> actually set it to something new)
> > >>>> - Set the random seed of the Java random number generator that
> > >>>> Tetris
> > >>>> is using to the new one we just created.
> > >>>> - Pack the new seed (long) into 2 ints.
> > >>>> - Return the data structure that holds the 2 ints.
> > >>>> So the way to think about this is you can call RL_get_random_seed
> > >>>> from
> > >>>> the experiment program, which returns you an object that you can
> > >>>> use
> > >>>> to go back to the same random generator state again later. You
> > >>>> could
> > >>>> do this:
> > >>>> (The exact function names might not be right, but you should get
> > >>>> the
> > >>>> idea...)
> > >>>> (All this happens in the experiment program)
> > >>>> currentSeed = RL_get_random_seed();
> > >>>> currentState = RL_get_state();
> > >>>> .... agent follows sequence of actions a0 a1 .... an, and along the
> > >>>> way observes o0 r0, o1 r1, ... on, rn ....
> > >>>> If the agent were now to follow the same sequence of actions, it
> > >>>> should hopefully (not thoroughly tested in Tetris) see the same
> > >>>> observations and rewards as before.
> > >>>> This can be used for experiment programs to test several agents
> > >>>> with
> > >>>> the *exact* same starting conditions.
> > >>>> Please let me know if that's clear (I'm afraid it might not be) and
> > >>>> ask more questions.
> > >>>> --
> > >>>> Brian Tanner
> > >>>> Ph.D Student, University of Alberta
> > >>>> br...@tannerpages.com
> > >>>> On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:
> > >>>>> Had a good poke around and things are coming together. One thing
> > >>>>> I'm
> > >>>>> wondering about is how the random seed works. The random seed
> > >>>>> can be
> > >>>>> set via env_set_random_seed(Random_seed key), but I'm not sure
> > >>>>> how the
> > >>>>> Random_seed_key class works. Even looking at its brief code, I
> > >>>>> can't
> > >>>>> tell how it sets the seed. Can it take a long like the java Random
> > >>>>> class or does it somehow combine the 2 int parameters into a long?
> > >>>>> On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:
> > >>>>>> Yeah. Are you familiar with subversion (svn)? It's commandline
> > >>>>>> tool
> > >>>>>> for interacting with source code repositories. If you are
> > >>>>>> familiar
> > >>>>>> with it, the best thing to do is to "check out" the whole
> > >>>>>> project:
> > >>>>>> svn checkouthttp://rl-competition.googlecode.com/svn/trunk/rl- > > >>>>>> competition-read-only
I'm glad things are going well. Proving and testing were both done
on the proving MDPs... and I see that the jars aren't checked into
subversion. I'll do that now, they're here:
http://code.google.com/p/rl-competition/source/browse/#svn/trunk/ private/environments/Tetrlais/provingJars
Anyways, we used a cheap hack for proving testing.
For proving, we used TPMDP[0...9] .
For testing, we just set the offset to and used the unspoiled proving
MDPS... PMDP[50...59].
We used the same trick for Mountain Car, and Polyathlon technically.
The leaderboard scores are always a function of the cumulative reward
in the proving run. For Tetris, it's just the cumulative reward. The
graphs that are on the website and that I showed at the workshop were
made by a program that Matt wrote... it does some fancy interpolation,
but basically yeah, it is just the cumulative results of all 10 MDPs. I posted our official *final* results table to this list just a second
ago:
http://groups.google.com/group/rl-competition-code/browse_thread/thre...
This should but a little more obvious to compare to.
I hope this helps. Keep the questions coming!
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com
> Hi again. I've gained full control of the code and things are going
> swimmingly. The following question concerns the Tetris domain.
> I have a question about how the proving and testing was done. The
> details on the website state that it was done over 10 MDPs (I assume
> taken randomly from the 15 you stated previously) and done with 5
> million steps per MDP.
> I assume the testing run was done in much the same manner.
> However, there is some overall ambiguity towards the scores and output
> graph. Are the leaderboard scores an average of the 10 MDPs or a
> total? And is the final performance graph an average (over 5 million
> steps) or a continual concatenation of the MDP scores (x axis = 50
> million steps)? It looks to be made up of 10 'chunks' and would
> account for the fluctuations of the lines.
> I'm just curious about this because I wish to emulate the results
> myself.
> On Aug 2, 1:47 pm, Sam Sarjant <effer...@gmail.com> wrote:
>> Sadly, my side of proving server access is no longer available so I
>> won't be able to try it without contacting the Tech Support guys
>> again. From my own experiments, it does seem to be better, but only
>> slightly. Although, this is only coming from a single run consisting
>> of about half a million steps. I'll be able to derive better
>> statistical results in the end anyway.
>> On Aug 2, 7:44 am, Brian Tanner <br...@tannerpages.com> wrote:
>>> We had no computational or time restrictions on agents this year. >>> So,
>>> you could use a supercomputer or a laptop, and you could write a
>>> super
>>> fast agent, or a slow one.
>>> The only restriction would be that you could actually complete a
>>> proving or testing run in a reasonable amount of time. For some
>>> events, the number of steps was very high, and an approach that
>>> takes
>>> a long time (as you describe) could take weeks or months to finish
>>> an
>>> official run.
>>> Try it out. The proving server is still online ;)
>>> --
>>> Brian Tanner
>>> Ph.D Student, University of Alberta
>>> br...@tannerpages.com
>>> On 31-Jul-08, at 10:52 PM, Sam Sarjant wrote:
>>>> This is unrelated to the code and more to the competition: Did the
>>>> time that the agent took to calculate the next action matter in the
>>>> competition? My agent that I submitted was fast (and fairly
>>>> effective)
>>>> but before the competition had ended I had created another agent
>>>> that
>>>> would theoretically do better (uses 1-step look ahead), but was
>>>> about
>>>> 119x slower.
>>>> It's too late now, but would this slow agent (takes about 0.2
>>>> second
>>>> to compute a goal piece location) have been accepted without
>>>> penalty?
>>>> If so, I might have got a better placing. Maybe it's better I don't
>>>> know, otherwise I'll regret not proving it beforehand.
>>>> I'm using this thread because I don't expect the RL-Competition
>>>> forum
>>>> to be as regularly watched.
>>>> - Sam
>>>> On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote:
>>>>> Hi again. The EpisodeLogger is some fanciness that we added for
>>>>> the
>>>>> competition so we could playback episodes. It's probably not what
>>>>> you're looking for.
>>>>> The way RL-Glue works, what you really want is inside the
>>>>> experiment program.
>>>>> If you look at the console trainers (experiment programs) in the
>>>>> public part of the competition directory, you should see what I
>>>>> mean.
>>>>> They allow you to either run one step at a time and accumulate the
>>>>> reward yourself, or you can use the RL_episode methods to run
>>>>> whole
>>>>> episodes and then get the rewards at the end.
>>>>> I think the javaConsoleTrainer should help make this clear. But
>>>>> if
>>>>> not, keep asking!
>>>>> On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:
>>>>>> Ah. I see it now. I forgot to look at the code in Tetrlias and it
>>>>>> was
>>>>>> right in my face. That should solve that problem.
>>>>>> Now, I was wondering if the environment stores the agent
>>>>>> performance
>>>>>> anywhere. I've found the EpisodeLogger object, which seems to
>>>>>> be an
>>>>>> outputStream (?). I'm not totally sure. Does this just store the
>>>>>> state
>>>>>> and actions taken by the agent?
>>>>>> I'm looking for something that stores the reward at each step
>>>>>> so I
>>>>>> can
>>>>>> produce graphs of performance.
>>>>>> On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote:
>>>>>>> I'm not sure if your question is in general about
>>>>>>> env_set_random_key,
>>>>>>> or about Tetris or another domain in particular.
>>>>>>> In general, the environment designer decides how to encode,
>>>>>>> store, and
>>>>>>> otherwise handle setting and getting of random seeds. We used a
>>>>>>> generic structure so that people could do whatever made sense
>>>>>>> for
>>>>>>> them. The idea of a "seed" is supposed to be abstract here,
>>>>>>> it's
>>>>>>> "some piece of encoded information" that can be shared between
>>>>>>> the
>>>>>>> environment and the experiment program. In some cases you could
>>>>>>> make
>>>>>>> up the random seed, in others you probably want to have matching
>>>>>>> env_get_random_seed and env_set_random_seed calls (ie: you can
>>>>>>> set the
>>>>>>> random generator back to an earlier point).
>>>>>>> Now that I've said that aloud it seems wrong, one major reason
>>>>>>> that we
>>>>>>> have the seed was supposed to be so that agents could do
>>>>>>> repeated
>>>>>>> sampling... but in the current system the agent can't call these
>>>>>>> methods. Maybe I'm just tired, but I think you may have
>>>>>>> deliberately
>>>>>>> or inadvertently pointed out a pretty big problem. Ahh well,
>>>>>>> moving
>>>>>>> along.
>>>>>>> In Tetris, env_get_random_seed does the following:
>>>>>>> public Random_seed_key env_get_random_seed() {
>>>>>>> if (allowSaveLoadSeed) {
>>>>>>> Random_seed_key k = new Random_seed_key(2, 0);
>>>>>>> long newSeed = gameState.getRandom().nextLong();
>>>>>>> gameState.getRandom().setSeed(newSeed);
>>>>>>> k.intArray[0] =
>>>>>>> UtilityShop.LongHighBitsToInt(newSeed);
>>>>>>> k.intArray[1] =
>>>>>>> UtilityShop.LongLowBitsToInt(newSeed);
>>>>>>> return k;
>>>>>>> }
>>>>>>> System.err.println("env_get_random_seed() called in:
>>>>>>> " +
>>>>>>> getClass() + " but it is disabled");
>>>>>>> return null;
>>>>>>> }
>>>>>>> I'll do some explaining:
>>>>>>> - Create a new structure of type Random_seed_key that will pass
>>>>>>> some
>>>>>>> encoding of the seed back to the experiment program. WSee're in
>>>>>>> Java,
>>>>>>> so that key to the random number generator is given to us as a
>>>>>>> long.
>>>>>>> We'll encode it into 2 ints, because we only have doubles and
>>>>>>> ints to
>>>>>>> choose from.
>>>>>>> - Generate a new random seed randomly (I don't know a way to get
>>>>>>> the
>>>>>>> *current* state of the Java random number generator, so we will
>>>>>>> actually set it to something new)
>>>>>>> - Set the random seed of the Java random number generator that
>>>>>>> Tetris
>>>>>>> is using to the new one we just created.
>>>>>>> - Pack the new seed (long) into 2 ints.
>>>>>>> - Return the data structure that holds the 2 ints.
>>>>>>> So the way to think about this is you can call
>>>>>>> RL_get_random_seed
>>>>>>> from
>>>>>>> the experiment program, which returns you an object that you can
>>>>>>> use
>>>>>>> to go back to the same random generator state again later. You
>>>>>>> could
>>>>>>> do this:
>>>>>>> (The exact function names might not be right, but you should get
>>>>>>> the
>>>>>>> idea...)
>>>>>>> (All this happens in the experiment program)
>>>>>>> currentSeed = RL_get_random_seed();
>>>>>>> currentState = RL_get_state();
>>>>>>> .... agent follows sequence of actions a0 a1 .... an, and
>>>>>>> along the
>>>>>>> way observes o0 r0, o1 r1, ... on, rn ....