Robocode and reinforcement learning

79 views
Skip to first unread message

Matthew Gerber

unread,
Mar 13, 2021, 12:06:36 PM3/13/21
to robocode
I've been picking away at RL for robocode, and I thought this might be of general interest in this group. It's a work in progress here. The RL framework is general-purpose and is described here.

Matt

Pavel Šavara

unread,
Mar 13, 2021, 3:49:03 PM3/13/21
to robo...@googlegroups.com
Hi Matt,

thanks for sharing!

Do I see it well that it took 3000 rounds to learn to lock the gun on the enemy ?
How much of that is done by you in feature selection ?
What's coming next ?

On Sat, Mar 13, 2021 at 6:06 PM Matthew Gerber <gerber....@gmail.com> wrote:
I've been picking away at RL for robocode, and I thought this might be of general interest in this group. It's a work in progress here. The RL framework is general-purpose and is described here.

Matt

--
You received this message because you are subscribed to the Google Groups "robocode" group.
To unsubscribe from this group and stop receiving emails from it, send an email to robocode+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/robocode/9523d0e6-7a3f-41ee-bc5c-5bde98cbfd97n%40googlegroups.com.

Matthew Gerber

unread,
Mar 14, 2021, 7:07:50 AM3/14/21
to robocode
Hi Pavel - 

I've added answers below.

On Saturday, March 13, 2021 at 3:49:03 PM UTC-5 Pavel Savara wrote:

Do I see it well that it took 3000 rounds to learn to lock the gun on the enemy ?

It takes about 5 rounds for the agent to settle into a decent aiming policy. The graph on the right shows time steps ("turns" in Robocode) for a single round. I've updated the webpage with additional details about this. Now you can see separate videos for training and testing. The training rounds tend to be chaotic because the learning agent is mixing its aiming policy with random behavior to maintain exploration and learning. In the testing video, the agent has no random behavior, so it's clear what the policy is:  rotate radar --> obtain bearing --> rotate gun --> fire. It was neat to see this tactic emerge from the specified features.

How much of that is done by you in feature selection ?

The updated webpage explains the features used. It looks like 5 features is sufficient to do radar-driven aiming against a stationary opponent.
 
What's coming next ?

I also added a couple TBD sections:  aiming against a mobile robot, and evasive movement.

Are you interested in working on this, or know anyone who might be? This is just a personal hobby project of mine, but I'd welcome collaboration.

Matt

Pavel Šavara

unread,
Mar 14, 2021, 7:59:38 AM3/14/21
to robo...@googlegroups.com
Re feature selection, I would be interested to see if you could also learn the "Has bearing on opponent" and "Square root of degree deviation from gun to opponent" functions.
I think that it would be even more interesting when you let it aim at moving target. For example, If it could extrapolate where the enemy would be at the time when the bullet arrives.

Regarding collaboration, I would be willing to collaborate on making remote robot API as a canonical part of Robocode.
Because network latency is a significant issue here, there needs to be an API which is not chatty. It needs to have one call per turn only.

1) Somewhat similar to OpenAI gym conceptually.

reset() -> observations
step(actions) -> observations, rewards


2) Also figure out how to make it possible for such robots to enter competition. 
- Adding a new competition category, with different robot turn timeouts.
- Solve how to host such remote robots. 

I think that we would like to enable a wide technological landscape for remote robot authors (like Python). 
Therefore we could not expect that the robot would be hosted by the community, which is running the competition on compute they donated.
I'm considering inverted server/client roles. We would let the robocode engine be the client of the robot's server.
So I imagine that the Java "robot" would only configure the Robocode engine with a URL of the robot's API.

The API the robot would have to implement would look like:
reset(observations) -> actions
turn(observations, rewards) -> actions
roundEnd(rewards)

What do you think ?
Would it be possible/interesting to create such 1) API and contribute it to OpenAI gym as a new environment ?
How difficult it would be to run RL with the inverted 2) API shape ?

Thank you!


--
You received this message because you are subscribed to the Google Groups "robocode" group.
To unsubscribe from this group and stop receiving emails from it, send an email to robocode+u...@googlegroups.com.

Matthew Gerber

unread,
Mar 14, 2021, 11:57:54 AM3/14/21
to robo...@googlegroups.com
Hi Pavel - 

On Sun, Mar 14, 2021 at 7:59 AM Pavel Šavara <pavel....@gmail.com> wrote:
Re feature selection, I would be interested to see if you could also learn the "Has bearing on opponent" and "Square root of degree deviation from gun to opponent" functions.

Feature / representation learning is an interesting (and popular) topic these days, particularly in the neural network community. My rlai package is agnostic to the value-function approximation model, and it would be possible to swap the current scikit-learn model (SGD regressor) for a deep neural network package. Perhaps I'll head that direction at some point, but for now I'm enjoying manual feature specification -- it's the "art" side of RL agent design, and it's fun to see extended tactics emerge from a small set of features.

I think that it would be even more interesting when you let it aim at moving target. For example, If it could extrapolate where the enemy would be at the time when the bullet arrives.

This is next, and "aiming ahead" will definitely be key!


Regarding collaboration, I would be willing to collaborate on making remote robot API as a canonical part of Robocode.
Because network latency is a significant issue here, there needs to be an API which is not chatty. It needs to have one call per turn only.

In my first attempt to integrate Robocode with my RL server, I tried to implement a REST API for the RL server. In this setup, the robot made REST/PUT requests to send game state to the RL server and retrieve actions to execute. It was possible to do this in a single REST/PUT command; but it was too slow (REST overhead). So I ditched the REST approach and stripped it down to a simple TCP socket, which is much faster. At this point I can get about 500 turns per second. This is good enough for learning simple skills (e.g., my aiming example), but it's not very fast. One reason it's slow is that my RL server is sending high-resolution actions to the game (e.g., rotate the gun 5 degrees per turn). If the actions specified larger movements (e.g., rotate the gun 100 degrees), then there would be less back-and-forth (chattiness) with the RL server, and turns/second would increase dramatically. Considerations of smaller/larger actions lead to the need for continuous actions spaces, which I also haven't done yet. There are so many interesting questions!

    1) Somewhat similar to OpenAI gym conceptually.

    reset() -> observations
    step(actions) -> observations, rewards

    I previously integrated rlai with OpenAI Gym (also on the website), and I patterned the Robocode integration similarly. Here's how each battle round proceeds:

    1. RL server opens TCP listener on 127.0.0.1:54321
    2. Robocode robot (client) opens socket to 127.0.0.1:54321
    3. RL server accepts client connection
    4. Robocode robot sends initial game state and RL server reads it (this is your "reset" operation above)
    5. Loop until round ends (this is your "step" operation above)
      1. Robocode robot reads next action from RL server
      2. Robocode robot executes the action and sends the updated game state to RL server
    By design, the Robocode robot does not send a reward value to the RL server. This allows the RL server to specify its own reward function. The reward function is a fundamental (and interesting) aspect of RL agent design, and I wanted to provide full flexibility to the RL agent designer.
     


    2) Also figure out how to make it possible for such robots to enter competition. 
    - Adding a new competition category, with different robot turn timeouts.
    - Solve how to host such remote robots. 

    Hosting remote robots is indeed the central problem that I see, too (more on this below).
     

    I think that we would like to enable a wide technological landscape for remote robot authors (like Python). 

    My original goal was to create a native Java wrapper around my Python RL code, so that the RL routines could be invoked directly by the Robocode engine; however, this sort of wrapper seems experimental and problematic. So I went with a TCP connection to achieve interprocess (i.e., Java <--> Python) communication. This is slower, but it has two significant benefits:  (1) TCP-based IPC works for any program/language, and (2) it opens up the possibility of running remote-hosted robot implementations, like we're discussing here.
     
    Therefore we could not expect that the robot would be hosted by the community, which is running the competition on compute they donated.

    I'm not so sure. Perhaps I'm missing something specific to Robocode, but it seems like the community might be able to host their own remote Robot implementations (i.e., servers as I've described above). 
     
    I'm considering inverted server/client roles. We would let the robocode engine be the client of the robot's server.

    I think this is essentially how I have architected my integration of Robocode and the RL server. The Java robot is the TCP client, and the RL server is a TCP server.
     
    So I imagine that the Java "robot" would only configure the Robocode engine with a URL of the robot's API.

    Exactly. Right now, my Robocode robot (Java) opens a connection to 127.0.0.1:54321; however, the author of the robot would be free to direct the connection to any host:port. Or, if we can control the robot hosting service, then we could specify all of the remote robot hosts/ports.
     

    The API the robot would have to implement would look like:
    reset(observations) -> actions
    turn(observations, rewards) -> actions
    roundEnd(rewards)

    What do you think ?

    This is pretty close to what I'm doing now. I think it's a workable approach.
     
    Would it be possible/interesting to create such 1) API and contribute it to OpenAI gym as a new environment ?

    OpenAI Gym supports many games, but none of them involve multiple RL agents running concurrently. I think a Robocode environment where multiple distributed RL agents could learn and compete would be a really nice addition. I'm not aware of anything like this.
     
    How difficult it would be to run RL with the inverted 2) API shape ?

    I think my current architecture is pretty close to what you have in mind. The most obvious difficulty (to me) would be figuring out where to host the remote robot servers, as you point out. Let's think about that.

    Matt

    Pavel Šavara

    unread,
    Mar 14, 2021, 1:07:01 PM3/14/21
    to robo...@googlegroups.com
    In my first attempt to integrate Robocode with my RL server, I tried to implement a REST API for the RL server. In this setup, the robot made REST/PUT requests to send game state to the RL server and retrieve actions to execute. It was possible to do this in a single REST/PUT command; but it was too slow (REST overhead).
    Interesting, I would not think that REST would make enough difference.

    We already have internal/custom binary protocol for that:
    actions == ExecCommands
    observations == ExecResults

    I implemented it years ago when I wanted to run a .NET robot and I did some in-process IPC.

    Is that good enough or we should prefer something more formal ?
    Like gRPC or protocol buffers in order to make it public contract/API ?
     
    By design, the Robocode robot does not send a reward value to the RL server. This allows the RL server to specify its own reward function. The reward function is a fundamental (and interesting) aspect of RL agent design, and I wanted to provide full flexibility to the RL agent designer.
    Definitely your RL needs to re-model the rewards. I'm just talking about Robocode's natural score. It evolves with each turn. 
    Maybe that's philosophically part of RL `observations` ?
     
    I'm not so sure. Perhaps I'm missing something specific to Robocode, but it seems like the community might be able to host their own remote Robot implementations (i.e., servers as I've described above). 
    Unless every author with the remote robot is willing to provide docker image, sharing my-robot-how-to-host-it on diverse platforms would not fly.
    That's why I'm suggesting that any robot author who wants to have his robot in the competition would have his server open and available to incoming battles all the time. On authors own servers.

    Matthew Gerber

    unread,
    Mar 15, 2021, 9:08:50 AM3/15/21
    to robo...@googlegroups.com
    On Sun, Mar 14, 2021 at 1:07 PM Pavel Šavara <pavel....@gmail.com> wrote:


    In my first attempt to integrate Robocode with my RL server, I tried to implement a REST API for the RL server. In this setup, the robot made REST/PUT requests to send game state to the RL server and retrieve actions to execute. It was possible to do this in a single REST/PUT command; but it was too slow (REST overhead).
    Interesting, I would not think that REST would make enough difference.

    Perhaps it was just my (incorrect) use of it. In any case, it was overly complicated (it needed a separate thread, synchronization locks, etc.) for the very simple thing I needed (synchronous read/write over TCP).


    We already have internal/custom binary protocol for that:
    actions == ExecCommands
    observations == ExecResults

    I implemented it years ago when I wanted to run a .NET robot and I did some in-process IPC.

    Is that good enough or we should prefer something more formal ?

    Personally, I much prefer a serialization format like JSON. But serialization/deserialization with JSON is probably slower versus binary due to parsing.

     
    By design, the Robocode robot does not send a reward value to the RL server. This allows the RL server to specify its own reward function. The reward function is a fundamental (and interesting) aspect of RL agent design, and I wanted to provide full flexibility to the RL agent designer.
    Definitely your RL needs to re-model the rewards. I'm just talking about Robocode's natural score. It evolves with each turn. 
    Maybe that's philosophically part of RL `observations` ?

    Indeed, I think it's reasonable to view the reward as a function of the state (observations), so as a practical consideration it should be sufficient to only pass observations (states) back to the agents and let them calculate reward signals as they wish. It's certainly convenient when the environment (OpenAI Gym or Robocode) precomputes a useful reward value. Formally (e.g., per the textbook), the agent's scope is limited to things that it can directly control (e.g., taking actions, sensing states, and processing rewards). So in this sense, allowing the agent to control the reward definition could be weird, since the agent might simply set it to infinity. This is certainly getting philosophical about what an agent is. Ultimately, the measure of a Robocode agent's success will be the official ranking provided by the Robocode engine, and this is beyond control of the agents. So there's no harm in allowing RL agents to dictate the internal reward signals they use for the purpose of learning good behavior policies.

     
    I'm not so sure. Perhaps I'm missing something specific to Robocode, but it seems like the community might be able to host their own remote Robot implementations (i.e., servers as I've described above). 
    Unless every author with the remote robot is willing to provide docker image, sharing my-robot-how-to-host-it on diverse platforms would not fly.
    That's why I'm suggesting that any robot author who wants to have his robot in the competition would have his server open and available to incoming battles all the time. On authors own servers.

    Is there a reason why we couldn't just adopt the standard client/server model used in online games, where a player sets up a server, other players join as clients, and off we go?

    Matt
    Reply all
    Reply to author
    Forward
    0 new messages