I guess in the meantime it wouldn't be horrible to just reference the
website? Something like:
@misc{ShiTanWhi08,
author={Shimon Whiteson and Brian Tanner and Adam White},
title={2008 Reinforcement Learning Competition},
year={2008},
note={\url{http://rl-competition.org}}
}
Unfortunately, that's the best I've got for you right now.
I'd like to see a copy of your final report if you feel like sharing :)
-Brian
--
Brian Tanner
Ph.D Student
University of Alberta
br...@tannerpages.com
I take a little exception your complaints about the training software
though -- just in the sense that you clearly didn't get the most out
of the software as was possible.
Just a few points that I guess weren't well enough explained or
demonstrated in the competition software documentation.
Restarting Agent for Every Experiment
--------------------------------------------------------
You don't have to restart your agent for every run. You could do
thousands of experiments on all sorts of different training MDPs
without ever restarting RL-Glue or your agent. It's just a matter of
calling RL_cleanup and RL_init between experiments. We eventually
even made an example to do this explicitly:
http://code.google.com/p/rl-competition/source/browse/trunk/public/trainers/consoleMultiTrainers/src/TetrisMultiTrainer.java
Experiment only gives access to total steps
-------------------------------------------------------------
RL-Glue experiment programs can either be run an episode at a time, or
a step at a time. Our examples used episodes because that way runs a
little faster and is easier to understand. However, there is another
mode where you call RL_step instead of RL_episode and then the
experiment program gets to see every observation, action, and reward
exchanged between the agent and environment.
Problem parameters not available
-------------------------------------------------
We decided to hide the piece distributions etc. for the competition
because we felt it was necessary to keep the integrity of the
competition. More details on these motivations in an upcoming report
that we are releasing :)
But, about the board size: it is encoded in the observations directly
so both the experiment and the agent can get at it pretty easily.
Anyways, just wanted to clear up these misunderstandings that
immediately jumped out at me. This tells me how we can improve things
for next year also!
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com
On 1-Nov-08, at 7:29 PM, Sam Sarjant wrote:
>
> I guess I didn't fully take the time to understand the trainer. Don't
> get me wrong though, I made use of the console trainer by adding in
> code here and there and interacting with the agent via agent_message,
> but I felt that a dedicated experimenter was the way to go.
>
I totally get that. I'd be interested in talking to you off-list a
little bit about that experimenter. Ideally, we hope that RL-Viz is
general enough that it can be used to build that kind of
infrastructure for all sorts of RL domains.
> I think, in my earlier versions of the agent, I didn't fill in the
> cleanup and init methods, which could have caused problems. The main
> issue with the trainer I had in terms of changing agents or
> environments was that I had to restart the trainer to change agents.
Ahh yeah, I see. RL-Viz does support dynamically changing agents in
Java (experimental support for C/C++), we just didn't make it obvious
in the competition software because it was one more complication and
didn't apply to all competitors. Again, I think your feedback can be
helpful for improving that project.
>
>
> And finally, I was aware that it was best to hide the problem
> parameters for the competition. However, seeing the parameters was
> useful for post-competition testing.
Yeah, we really should go back and expose that information and make it
a little easier to get at. We've just been busy.
>
>
> The only issue I had with the competition was the unlimited testing
> time. An agent could wait hours, perhaps days before choosing a move.
> As stated, Loria INRIA - MAIA found this loophole and exploited it.
> Which is fair enough, I have no qualms with them. But the time factor
> creates the possibility of nearly-fixed policy agents.
This was definitely not quite a loophole, it was something we
anticipated. It was a design decision of the competition to not
measure or worry about how much computation was spent per step. Our
parameterization of Tetris did not force people to use "good old
fashioned RL", for better or worse.
>
>
> Also, judging on a total number of steps rather than episodes was a
> little odd, but acceptable test of performance.
This is a question that we need to resolve in the empirical evaluation
community, in my opinion. Using a step limit is not perfect, but it
has a whole host of advantages over an episode limit. Episode limits
are really weird. I'm contributing to an extended tech report based
on the competition which will address this.
>
>
> Well, the competition was fun anyway and I hope to enter it again next
> year.
Great! Thanks for all your comments!
I could also probably help you with your dynamic loading problems...
RL-Viz does some of this.
I'd have to pick through the code a bit to make a clean example, but
basically you can point RL-Viz at a directory full of jar files, and
say "get me everything that implements X", and it will go through
every jar, find all the classes that match your criteria, and give you
a nice list. Then you can say "gimme *that* one", and it will
dynamically load the class from the jar and give you an instance.
It's how we do all the dynamic env/agent loading in RL-Viz and the
competition.
An example that does this is here:
http://code.google.com/p/rl-viz/source/browse/trunk/projects/rlVizLibJava/src/rlVizLib/dynamicLoading/LocalJarAgentEnvironmentLoader.java
It's a bit fancy, and not commented, but if you're interested I could
help.
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com