Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
First Dump of the Secret Code
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - 12 new - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Brian Tanner  
View profile  
 More options Jul 14 2008, 11:04 pm
From: Brian Tanner <br...@tannerpages.com>
Date: Mon, 14 Jul 2008 22:04:33 -0500
Local: Mon, Jul 14 2008 11:04 pm
Subject: First Dump of the Secret Code
I've just dumped all of our secret private code from the secret 2008  
repositories to the public Google Code Project:

http://rl-competition.googlecode.com/

I've put the distributed code (the stuff you guys downloaded) in /
trunk/public and all of the secret stuff in /trunk/private

The private code includes all of the scripts we used to generate the  
proving and testing MDPs, all of our generalizations, all of our  
parameter values, etc, etc.

It includes source code for some domains that we didn't use.

The includes source code for the leaderboards, the "phone home server"  
ruby application that handled all of the results recording, the  
proving application, etc, etc.

BE WARNED: The code is a total mess right now.  We intend to clean it  
up and document it to some degree.  But, I wanted to have a snapshot  
available of EXACTLY what we were working with in case it is important  
for future reference.

Unforunately, by moving the code around into different relative  
directories, some of the build scripts won't be putting things in the  
right place.  We should work on fixing that soon.

Some of the projects build with ant from the command line.  Others are  
NetBeans projects and will require NetBeans to build.  Sorry.  Have  
patience.
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Jul 15 2008, 11:07 pm
From: Sam Sarjant <effer...@gmail.com>
Date: Tue, 15 Jul 2008 20:07:29 -0700 (PDT)
Local: Tues, Jul 15 2008 11:07 pm
Subject: Re: First Dump of the Secret Code
Uh, I can't find it anywhere. I don't see any files in this group and
no files in the googlecode link. I'm not terribly familiar with these
google tools, so perhaps it's something I'm missing or not doing
right?

On Jul 15, 3:04 pm, Brian Tanner <br...@tannerpages.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Tanner  
View profile  
 More options Jul 15 2008, 11:39 pm
From: Brian Tanner <br...@tannerpages.com>
Date: Tue, 15 Jul 2008 22:39:30 -0500
Local: Tues, Jul 15 2008 11:39 pm
Subject: Re: First Dump of the Secret Code
Yeah.  Are you familiar with subversion (svn)? It's commandline tool  
for interacting with source code repositories.  If you are familiar  
with it, the best thing to do is to "check out" the whole project:
svn checkout http://rl-competition.googlecode.com/svn/trunk/ rl-
competition-read-only

(Instructions are under the "source" tab of the google code project)

I will post a downloadable link at some point, but for now with the  
code in such a messy state I don't want to make it easy for people to  
get the whole codebase unless they are savvy with the tools.

If you want to just poke around, you can explore the code here:
http://code.google.com/p/rl-competition/source/browse

If I remember, you're looking for the Tetris event (formally called  
tetrlais to avoid copyright issues).

The relevant places to look would be:

The environment:
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

The state structure:
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

The Proving and Testing MDPs:
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...
...
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

MDPs 0...14 were used for proving, and 50...64 were used for testing.

The training MDPs are all here:
http://code.google.com/p/rl-competition/source/browse/trunk/private/e...

Unfortunately I don't have lots of time to talk you through the finer  
points right now... but if you keep asking questions via this list,  
we'll be happy to answer them, and then we'll have a formal,  
searchable record of the issues that come up.

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

On 15-Jul-08, at 10:07 PM, Sam Sarjant wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Jul 16 2008, 5:58 pm
From: Sam Sarjant <effer...@gmail.com>
Date: Wed, 16 Jul 2008 14:58:10 -0700 (PDT)
Local: Wed, Jul 16 2008 5:58 pm
Subject: Re: First Dump of the Secret Code
Excellent. Thanks. I'll have to check it out soon. If I have any
questions I'll be sure to contact you, but I think I have a fairly
good grip of the process.

On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Jul 25 2008, 9:20 pm
From: Sam Sarjant <effer...@gmail.com>
Date: Fri, 25 Jul 2008 18:20:11 -0700 (PDT)
Local: Fri, Jul 25 2008 9:20 pm
Subject: Re: First Dump of the Secret Code
Had a good poke around and things are coming together. One thing I'm
wondering about is how the random seed works. The random seed can be
set via env_set_random_seed(Random_seed key), but I'm not sure how the
Random_seed_key class works. Even looking at its brief code, I can't
tell how it sets the seed. Can it take a long like the java Random
class or does it somehow combine the 2 int parameters into a long?

On Jul 16, 3:39 pm, Brian Tanner <br...@tannerpages.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Tanner  
View profile  
 More options Jul 25 2008, 10:32 pm
From: Brian Tanner <br...@tannerpages.com>
Date: Fri, 25 Jul 2008 20:32:57 -0600
Local: Fri, Jul 25 2008 10:32 pm
Subject: Re: First Dump of the Secret Code
I'm not sure if your question is in general about env_set_random_key,  
or about Tetris or another domain in particular.

In general, the environment designer decides how to encode, store, and  
otherwise handle setting and getting of random seeds.  We used a  
generic structure so that people could do whatever made sense for  
them.  The idea of a "seed" is supposed to be abstract here, it's  
"some piece of encoded information" that can be shared between the  
environment and the experiment program.  In some cases you could make  
up the random seed, in others you probably want to have matching  
env_get_random_seed and env_set_random_seed calls (ie: you can set the  
random generator back to an earlier point).

Now that I've said that aloud it seems wrong, one major reason that we  
have the seed was supposed to be so that agents could do repeated  
sampling... but in the current system the agent can't call these  
methods.  Maybe I'm just tired, but I think you may have deliberately  
or inadvertently pointed out a pretty big problem.  Ahh well, moving  
along.

In Tetris, env_get_random_seed does the following:

     public Random_seed_key env_get_random_seed() {
         if (allowSaveLoadSeed) {
             Random_seed_key k = new Random_seed_key(2, 0);
             long newSeed = gameState.getRandom().nextLong();
             gameState.getRandom().setSeed(newSeed);
             k.intArray[0] = UtilityShop.LongHighBitsToInt(newSeed);
             k.intArray[1] = UtilityShop.LongLowBitsToInt(newSeed);
             return k;
         }
         System.err.println("env_get_random_seed() called in: " +  
getClass() + " but it is disabled");
         return null;
     }

I'll do some explaining:

- Create a new structure of type Random_seed_key that will pass some  
encoding of the seed back to the experiment program.  WSee're in Java,  
so that key to the random number generator is given to us as a long.  
We'll encode it into 2 ints, because we only have doubles and ints to  
choose from.

- Generate a new random seed randomly (I don't know a way to get the  
*current* state of the Java random number generator, so we will  
actually set it to something new)

-  Set the random seed of the Java random number generator that Tetris  
is using to the new one we just created.

-  Pack the new seed (long) into 2 ints.

- Return the data structure that holds the 2 ints.

So the way to think about this is you can call RL_get_random_seed from  
the experiment program, which returns you an object that you can use  
to go back to the same random generator state again later.  You could  
do this:

(The exact function names might not be right, but you should get the  
idea...)
(All this happens in the experiment program)
currentSeed = RL_get_random_seed();
currentState = RL_get_state();

.... agent follows sequence of actions a0 a1 .... an, and along the  
way observes o0 r0, o1 r1, ... on, rn ....

RL_set_random_seed(currentSeed)
RL_set_state(currentState)

If the agent were now to follow the same sequence of actions, it  
should hopefully (not thoroughly tested in Tetris) see the same  
observations and rewards as before.

This can be used for experiment programs to test several agents with  
the *exact* same starting conditions.

Please let me know if that's clear (I'm afraid it might not be) and  
ask more questions.

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

On 25-Jul-08, at 7:20 PM, Sam Sarjant wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Jul 26 2008, 12:25 am
From: Sam Sarjant <effer...@gmail.com>
Date: Fri, 25 Jul 2008 21:25:26 -0700 (PDT)
Local: Sat, Jul 26 2008 12:25 am
Subject: Re: First Dump of the Secret Code
Ah. I see it now. I forgot to look at the code in Tetrlias and it was
right in my face. That should solve that problem.

Now, I was wondering if the environment stores the agent performance
anywhere. I've found the EpisodeLogger object, which seems to be an
outputStream (?). I'm not totally sure. Does this just store the state
and actions taken by the agent?

I'm looking for something that stores the reward at each step so I can
produce graphs of performance.

On Jul 26, 2:32 pm, Brian Tanner <br...@tannerpages.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Tanner  
View profile  
 More options Jul 26 2008, 12:30 am
From: "Brian Tanner" <br...@tannerpages.com>
Date: Fri, 25 Jul 2008 22:30:50 -0600
Local: Sat, Jul 26 2008 12:30 am
Subject: Re: First Dump of the Secret Code
Hi again. The EpisodeLogger is some fanciness that we added for the
competition so we could playback episodes.  It's probably not what
you're looking for.

The way RL-Glue works, what you really want is inside the experiment program.

If you look at the console trainers (experiment programs) in the
public part of the competition directory, you should see what I mean.
They allow you to either run one step at a time and accumulate the
reward yourself, or you can use the RL_episode methods to run whole
episodes and then get the rewards at the end.

I think the javaConsoleTrainer should help make this clear.  But if
not, keep asking!

On 7/25/08, Sam Sarjant <effer...@gmail.com> wrote:

--
Brian Tanner
Ph.D Student
University of Alberta
br...@tannerpages.com

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Aug 1 2008, 12:52 am
From: Sam Sarjant <effer...@gmail.com>
Date: Thu, 31 Jul 2008 21:52:32 -0700 (PDT)
Local: Fri, Aug 1 2008 12:52 am
Subject: Re: First Dump of the Secret Code
This is unrelated to the code and more to the competition: Did the
time that the agent took to calculate the next action matter in the
competition? My agent that I submitted was fast (and fairly effective)
but before the competition had ended I had created another agent that
would theoretically do better (uses 1-step look ahead), but was about
119x slower.

It's too late now, but would this slow agent (takes about 0.2 second
to compute a goal piece location) have been accepted without penalty?
If so, I might have got a better placing. Maybe it's better I don't
know, otherwise I'll regret not proving it beforehand.

I'm using this thread because I don't expect the RL-Competition forum
to be as regularly watched.

- Sam

On Jul 26, 4:30 pm, "Brian Tanner" <br...@tannerpages.com> wrote:

...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Tanner  
View profile  
 More options Aug 1 2008, 3:44 pm
From: Brian Tanner <br...@tannerpages.com>
Date: Fri, 1 Aug 2008 13:44:17 -0600
Local: Fri, Aug 1 2008 3:44 pm
Subject: Re: First Dump of the Secret Code
We had no computational or time restrictions on agents this year.  So,  
you could use a supercomputer or a laptop, and you could write a super  
fast agent, or a slow one.

The only restriction would be that you could actually complete a  
proving or testing run in a reasonable amount of time.  For some  
events, the number of steps was very high, and an approach that takes  
a long time (as you describe) could take weeks or months to finish an  
official run.

Try it out.  The proving server is still online ;)

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

On 31-Jul-08, at 10:52 PM, Sam Sarjant wrote:

...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Aug 1 2008, 9:47 pm
From: Sam Sarjant <effer...@gmail.com>
Date: Fri, 1 Aug 2008 18:47:06 -0700 (PDT)
Local: Fri, Aug 1 2008 9:47 pm
Subject: Re: First Dump of the Secret Code
Sadly, my side of proving server access is no longer available so I
won't be able to try it without contacting the Tech Support guys
again. From my own experiments, it does seem to be better, but only
slightly. Although, this is only coming from a single run consisting
of about half a million steps. I'll be able to derive better
statistical results in the end anyway.

On Aug 2, 7:44 am, Brian Tanner <br...@tannerpages.com> wrote:

...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Sarjant  
View profile  
 More options Sep 1 2008, 7:08 pm
From: Sam Sarjant <effer...@gmail.com>
Date: Mon, 1 Sep 2008 16:08:09 -0700 (PDT)
Local: Mon, Sep 1 2008 7:08 pm
Subject: Re: First Dump of the Secret Code
Hi again. I've gained full control of the code and things are going
swimmingly. The following question concerns the Tetris domain.

I have a question about how the proving and testing was done. The
details on the website state that it was done over 10 MDPs (I assume
taken randomly from the 15 you stated previously) and done with 5
million steps per MDP.
I assume the testing run was done in much the same manner.

However, there is some overall ambiguity towards the scores and output
graph. Are the leaderboard scores an average of the 10 MDPs or a
total? And is the final performance graph an average (over 5 million
steps) or a continual concatenation of the MDP scores (x axis = 50
million steps)? It looks to be made up of 10 'chunks' and would
account for the fluctuations of the lines.

I'm just curious about this because I wish to emulate the results
myself.

On Aug 2, 1:47 pm, Sam Sarjant <effer...@gmail.com> wrote:

...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Proving/Testing MDPs and Jars and Leaderboard scores" by Brian Tanner
Brian Tanner  
View profile  
 More options Sep 2 2008, 11:23 am
From: Brian Tanner <br...@tannerpages.com>
Date: Tue, 2 Sep 2008 09:23:16 -0600
Local: Tues, Sep 2 2008 11:23 am
Subject: Proving/Testing MDPs and Jars and Leaderboard scores
Hi Sam.

I'm glad things are going well.   Proving and testing were both done  
on the proving MDPs... and I see that the jars aren't checked into  
subversion.  I'll do that now, they're here:
http://code.google.com/p/rl-competition/source/browse/#svn/trunk/
private/environments/Tetrlais/provingJars

Their source is in:
http://code.google.com/p/rl-competition/source/browse/#svn/trunk/
private/environments/Tetrlais/src/  <-- each proving MDP is in it's  
own package (yech!  We have solved the technical limitation which  
required this for last year) :)

Anyways, we used a cheap hack for proving testing.

For proving, we used TPMDP[0...9] .

For testing, we just set the offset to and used the unspoiled proving  
MDPS... PMDP[50...59].

We used the same trick for Mountain Car, and Polyathlon technically.

The leaderboard scores are always a function of the cumulative reward  
in the proving run.  For Tetris, it's just the cumulative reward.  The  
graphs that are on the website and that I showed at the workshop were  
made by a program that Matt wrote... it does some fancy interpolation,  
but basically yeah, it is just the cumulative results of all 10 MDPs.  
I posted our official *final* results table to this list just a second  
ago:
http://groups.google.com/group/rl-competition-code/browse_thread/thre...

This should but a little more obvious to compare to.

I hope this helps.  Keep the questions coming!

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

On 1-Sep-08, at 5:08 PM, Sam Sarjant wrote:

...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google