BibTeX entry

Sam Sarjant

unread,

Oct 12, 2008, 3:31:10 AM10/12/08

to RL Competition Code

I'm preparing my final report which concerns this competition and I
was wondering if there was a BibTeX entry available for the
competition or perhaps a publication regarding the competition?

Brian Tanner

unread,

Oct 13, 2008, 5:58:12 PM10/13/08

to rl-compet...@googlegroups.com

Hi Sam. We're currently working on a publication, but it won't be ready
for a while.

I guess in the meantime it wouldn't be horrible to just reference the
website? Something like:

@misc{ShiTanWhi08,
author={Shimon Whiteson and Brian Tanner and Adam White},
title={2008 Reinforcement Learning Competition},
year={2008},
note={\url{http://rl-competition.org}}
}

Unfortunately, that's the best I've got for you right now.

I'd like to see a copy of your final report if you feel like sharing :)

-Brian

Sam Sarjant

unread,

Oct 13, 2008, 11:42:10 PM10/13/08

to RL Competition Code

That works fine. I was mostly wondering who to put down as the authors
as I wasn't entirely sure.

My report should (unless something goes horribly awry) be ready by the
end of the month. I guess I can share it online but I better check
University regulations.
- Sam

Brian Tanner

unread,

Oct 14, 2008, 12:24:16 AM10/14/08

to rl-compet...@googlegroups.com

Ok. Good luck :)

--
Brian Tanner
Ph.D Student
University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Oct 30, 2008, 8:34:33 PM10/30/08

to RL Competition Code

Well, the report is finished. I've uploaded it to my personal site and
you can view it from there. I hope that I got all of the facts right
with the competition and such. Anyway, it's at
http://super-sanity.com/wp-content/uploads/finalreport.pdf.

On Oct 14, 5:24 pm, "Brian Tanner" <br...@tannerpages.com> wrote:
> Ok. Good luck :)
>

Brian Tanner

unread,

Oct 31, 2008, 1:34:06 PM10/31/08

to rl-compet...@googlegroups.com

The report looks good. I only took a quick look for now, but I'll be
sure to check it in more detail later, thanks for posting it. We'll
try to use your feedback as much as possible towards improving things
for next year.

I take a little exception your complaints about the training software
though -- just in the sense that you clearly didn't get the most out
of the software as was possible.

Just a few points that I guess weren't well enough explained or
demonstrated in the competition software documentation.

Restarting Agent for Every Experiment
--------------------------------------------------------
You don't have to restart your agent for every run. You could do
thousands of experiments on all sorts of different training MDPs
without ever restarting RL-Glue or your agent. It's just a matter of
calling RL_cleanup and RL_init between experiments. We eventually
even made an example to do this explicitly:
http://code.google.com/p/rl-competition/source/browse/trunk/public/trainers/consoleMultiTrainers/src/TetrisMultiTrainer.java

Experiment only gives access to total steps
-------------------------------------------------------------
RL-Glue experiment programs can either be run an episode at a time, or
a step at a time. Our examples used episodes because that way runs a
little faster and is easier to understand. However, there is another
mode where you call RL_step instead of RL_episode and then the
experiment program gets to see every observation, action, and reward
exchanged between the agent and environment.

Problem parameters not available
-------------------------------------------------
We decided to hide the piece distributions etc. for the competition
because we felt it was necessary to keep the integrity of the
competition. More details on these motivations in an upcoming report
that we are releasing :)

But, about the board size: it is encoded in the observations directly
so both the experiment and the agent can get at it pretty easily.

Anyways, just wanted to clear up these misunderstandings that
immediately jumped out at me. This tells me how we can improve things
for next year also!

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Sam Sarjant

unread,

Nov 1, 2008, 9:29:25 PM11/1/08

to RL Competition Code

I guess I didn't fully take the time to understand the trainer. Don't
get me wrong though, I made use of the console trainer by adding in
code here and there and interacting with the agent via agent_message,
but I felt that a dedicated experimenter was the way to go.

I think, in my earlier versions of the agent, I didn't fill in the
cleanup and init methods, which could have caused problems. The main
issue with the trainer I had in terms of changing agents or
environments was that I had to restart the trainer to change agents.

And finally, I was aware that it was best to hide the problem
parameters for the competition. However, seeing the parameters was
useful for post-competition testing.

The only issue I had with the competition was the unlimited testing
time. An agent could wait hours, perhaps days before choosing a move.
As stated, Loria INRIA - MAIA found this loophole and exploited it.
Which is fair enough, I have no qualms with them. But the time factor
creates the possibility of nearly-fixed policy agents.

Also, judging on a total number of steps rather than episodes was a
little odd, but acceptable test of performance.

Well, the competition was fun anyway and I hope to enter it again next
year.
- Sam Sarjant

On Nov 1, 6:34 am, Brian Tanner <br...@tannerpages.com> wrote:
> The report looks good. I only took a quick look for now, but I'll be
> sure to check it in more detail later, thanks for posting it. We'll
> try to use your feedback as much as possible towards improving things
> for next year.
>
> I take a little exception your complaints about the training software
> though -- just in the sense that you clearly didn't get the most out
> of the software as was possible.
>
> Just a few points that I guess weren't well enough explained or
> demonstrated in the competition software documentation.
>
> Restarting Agent for Every Experiment
> --------------------------------------------------------
> You don't have to restart your agent for every run. You could do
> thousands of experiments on all sorts of different training MDPs
> without ever restarting RL-Glue or your agent. It's just a matter of
> calling RL_cleanup and RL_init between experiments. We eventually

> even made an example to do this explicitly:http://code.google.com/p/rl-competition/source/browse/trunk/public/tr...

Brian Tanner

unread,

Nov 2, 2008, 11:19:41 AM11/2/08

to rl-compet...@googlegroups.com

I'll respond in-line.

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

On 1-Nov-08, at 7:29 PM, Sam Sarjant wrote:

>
> I guess I didn't fully take the time to understand the trainer. Don't
> get me wrong though, I made use of the console trainer by adding in
> code here and there and interacting with the agent via agent_message,
> but I felt that a dedicated experimenter was the way to go.
>

I totally get that. I'd be interested in talking to you off-list a
little bit about that experimenter. Ideally, we hope that RL-Viz is
general enough that it can be used to build that kind of
infrastructure for all sorts of RL domains.

> I think, in my earlier versions of the agent, I didn't fill in the
> cleanup and init methods, which could have caused problems. The main
> issue with the trainer I had in terms of changing agents or
> environments was that I had to restart the trainer to change agents.

Ahh yeah, I see. RL-Viz does support dynamically changing agents in
Java (experimental support for C/C++), we just didn't make it obvious
in the competition software because it was one more complication and
didn't apply to all competitors. Again, I think your feedback can be
helpful for improving that project.

>
>
> And finally, I was aware that it was best to hide the problem
> parameters for the competition. However, seeing the parameters was
> useful for post-competition testing.

Yeah, we really should go back and expose that information and make it
a little easier to get at. We've just been busy.

>
>
> The only issue I had with the competition was the unlimited testing
> time. An agent could wait hours, perhaps days before choosing a move.
> As stated, Loria INRIA - MAIA found this loophole and exploited it.
> Which is fair enough, I have no qualms with them. But the time factor
> creates the possibility of nearly-fixed policy agents.

This was definitely not quite a loophole, it was something we
anticipated. It was a design decision of the competition to not
measure or worry about how much computation was spent per step. Our
parameterization of Tetris did not force people to use "good old
fashioned RL", for better or worse.

>
>
> Also, judging on a total number of steps rather than episodes was a
> little odd, but acceptable test of performance.

This is a question that we need to resolve in the empirical evaluation
community, in my opinion. Using a step limit is not perfect, but it
has a whole host of advantages over an episode limit. Episode limits
are really weird. I'm contributing to an extended tech report based
on the competition which will address this.

>
>
> Well, the competition was fun anyway and I hope to enter it again next
> year.

Great! Thanks for all your comments!

Sam Sarjant

unread,

Nov 3, 2008, 3:50:07 PM11/3/08

to RL Competition Code

Would you like the binary and source code for the experimenter?
Although I preach about it's dynamic agent loading, there is a large
problem. It can only load agents called SmartAgent.class... This is
because I couldn't find an easy way of simply loading an 'Agent'
interface - it had to be a solid class. With a few small changes in
the code, it could load other agents, but the code would have to be
changed each time.

So, to use it properly, I'll have to send along one of my agents too.

> Yeah, we really should go back and expose that information and make it
> a little easier to get at. We've just been busy.

I found it easily enough. Under all the MDPs folder or something. I
can't remember where I found it, but I found it.

> This is a question that we need to resolve in the empirical evaluation
> community, in my opinion. Using a step limit is not perfect, but it
> has a whole host of advantages over an episode limit. Episode limits
> are really weird. I'm contributing to an extended tech report based
> on the competition which will address this.

I can see why going by episodes is weird, especially in Tetris, where
a single misplacing of piece followed by an unfortunate string of
pieces could cost you the game.

Brian Tanner

unread,

Nov 7, 2008, 1:10:52 AM11/7/08

to rl-compet...@googlegroups.com

Yes I'd very much like this code from you. I'm working on a demo that
we're doing at the NIPS conference in a few weeks, and I'm looking for
easy, flashy things to add to RL-Viz to spice it up a bit.

I could also probably help you with your dynamic loading problems...
RL-Viz does some of this.

I'd have to pick through the code a bit to make a clean example, but
basically you can point RL-Viz at a directory full of jar files, and
say "get me everything that implements X", and it will go through
every jar, find all the classes that match your criteria, and give you
a nice list. Then you can say "gimme *that* one", and it will
dynamically load the class from the jar and give you an instance.
It's how we do all the dynamic env/agent loading in RL-Viz and the
competition.

An example that does this is here:
http://code.google.com/p/rl-viz/source/browse/trunk/projects/rlVizLibJava/src/rlVizLib/dynamicLoading/LocalJarAgentEnvironmentLoader.java

It's a bit fancy, and not commented, but if you're interested I could
help.

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Reply all

Reply to author

Forward