slime volleyball game AI using convnetjs and reinforcement learning

852 views
Skip to first unread message

hardmaru

unread,
Apr 1, 2015, 6:59:05 PM4/1/15
to conv...@googlegroups.com, David Ha

I've been playing around with this library for a while and I found the code and demos incredibly useful to learn about implementing neural networks into interactive web applications.  Thanks a lot for creating such a fantastic library!  My interests is more geared towards reinforcement learning, task performing agents and control systems, rather than computer vision which recently seems to be a more popular topic recently.  I was initially interested to apply Q-learning techniques to agent control problems, but I found that training to be difficult for problems with too many continuous states.  After a while, I decided to go for more direct policy search approaches, and ended up writing some simple and conventional neural evolution trainer addins for training convnetjs neural nets.

Anyways if you have some time or want to take a break from your research, check out the slime volleyball game demo I made using convnetjs.  It is basically a HTML5-js clone of the classic slime volleyball java applet game that was popular a decade ago.

In the game, I trained a recurrent neural network to control the agent to play slime volleyball, by having a population of these networks playing against each other.  The best performing networks are kept, and the crappy ones are thrown away and replaced by crossover and mutated versions of the surviving networks.  After a few hundred generations, they become quite decent at playing the game to the point that even I can't beat them anymore.  Let me know if you guys can consistently win them :)

The networks were all initialised with random weight/bias values, and is fed in as input states some game state informations (location and velocities of the agents and ball).  Three of the output states will activate whether the agent will move forward, move backward, or jump if the states exceed a certain threshold in value.  In addition, four hidden states will be fed back into the inputs, which makes the network look like a normal feed forward network of infinite depth.  There is more information about the implementation details on my blog.

My plan is to learn more about more reinforcement learning algorithms and to create demos that can be interactively viewed online, as it's cooler to be able to run everything on a web browser rather than just see someone's results on a youtube video.  Let me know if you have any feedback or suggestions!

Thanks

Dave




Andrej

unread,
Apr 1, 2015, 8:23:46 PM4/1/15
to hardmaru, conv...@googlegroups.com, David Ha
This is awesome thanks for the link and writeup :) 

I think this could have been done very nicely with RL, without all this evolution mambo jambo, but it probably takes a little more effort :) In this case policy gradient methods would work nicely, or maybe even DQN that's already implemeted in ConvNetJS. I would have thought it should work okay on this kind of reactive task, with some amount of training. 

Cheers!
Andrej


--
You received this message because you are subscribed to the Google Groups "convnetjs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to convnetjs+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hardmaru

unread,
Apr 2, 2015, 9:28:24 AM4/2/15
to conv...@googlegroups.com, dav...@gmail.com, hard...@gmail.com
Glad you enjoyed the game, Andrej!

Yeah, I gotta take a closer look at DQN again.  Perhaps you or someone can take up the challenge to create a DQN-based slime volleyball agent and battle it out with the existing neuroevolved AI and see who is ultimately better :)  I github'ed the source code to neural slime volleyball in case anyone is interested, apologies in advance that it isn't the cleanest code in the world as it was intended to be more of a quick sketch rather than a program.

Actually, the truth is when I first started looking at this stuff, I tried to use the DQN that is already implemented in convnetjs to perform tasks such as balancing an inverted cart-pole pendulum model which are toy problems in control systems, but this seemed quite difficult to get it to work due to the large amount of continuous states.  I gave up after a few weeks of trying unfortunately, but luckily stumbled across some papers on genetic algorithms and neuroevolution techniques and eventually trained some convnetjs networks to be able to balance a even a double linked inverted pendulum system.  After I got that working, I simply continued to use the ga-based methods on game AI, but you are probably right, DQN will probably work just as well in this game.

If you are curious, please check out your convnetjs in action when used with a _proper_ physics engine (box2d transcompiled to JS via emscripten), in this double inverted pendulum self balancing demo (details explained in blog post).  With proper physics simulation possible in web browsers, combined with browser-run neural networks, the sky's the limit! =)

Happy long weekend (over there)!

Dave

Andrej

unread,
Apr 2, 2015, 2:07:47 PM4/2/15
to hardmaru, conv...@googlegroups.com, hard maru
wow, i LOVE the pendulum demo too. I will almost definitely play with getting DQN to work on these, or maybe try some policy gradient stuff. It's sligthly finicky and fragile, one has to know how the alg works and be able to debug it and play with all of its hyperparameters. I'll give it a shot, thanks for the links!

"""With proper physics simulation possible in web browsers, combined with browser-run neural networks, the sky's the limit! =)""" +1.


hardmaru

unread,
Apr 6, 2015, 7:10:38 AM4/6/15
to conv...@googlegroups.com, dav...@gmail.com, hard...@gmail.com
The issue I have with gradient based policy methods (even DQN) is the likelihood of getting stuck in a local optima.  With non-gradient-based direct policy search methods (ie ga's and evolution), convergence can be slow but at least a large part of the solution space can be tried out and compared.

One of my research goals in the future would be to attempt to converge the two, ie maybe use evolution to come up with several candidate nets (even also evolving net's structure rather than guessing weights on a pre-programmed geometry), and then use gradient policy methods to fine tune the solution.

But currently this stuff is currently way over my head, as I still have a day job and don't have the time to learn.  I should look at the automatic differentiation work that seems to be popular these days (I saw some stuff on your recurrent.js code, but haven't had the time to hack it yet), so perhaps there is a way to fine-tune arbitrary evolved RNN networks.  If you have any opinion on the direction one should approach and tackle this, or know of any good research papers about gradient policy methods with promising results, pls let me know!

If you guys want to hack around with the code, and try to use other methods for the AI, I suggest playing with the second level version (also on the github).  The original version's ground area is too small, and I think it is too easy to create an AI (or even hand code one) to not lose most of the time.  As the second level version has a larger ground area relative to the agents, it increases the chances of the existence a strategy to hit the ball to a location that is difficult for the opponent to reach, so it is more appropriate to use as a benchmark against different AI methods.

Message has been deleted

Louis Smit

unread,
Apr 6, 2015, 10:28:41 AM4/6/15
to conv...@googlegroups.com, dav...@gmail.com, hard...@gmail.com
As I told you on Twitter, very cool project, hardmaru!
I'm not an RL guy, but I found it very interesting that such a small RNN can produce this behavior.

First some thoughts about the game: I think you can make the game harder for the bot by continuing what you already started. Playing with field size, acceleration and speed of the ball. You basically want to make positioning more important.
Also, one neat thing you could do is enforce the "no more than 3 touches" rule of volleyball. This way the net would have to model longer range dependencies and I would suspect LSTMs would do better in that case.
I think this could be a cool little competition for people who don't want to deal with all the computer vision machinery necessary for the Atari challenge.

As for the training, I'm definitely going to read up on DQN and others. I think it's quite an interesting problem of assigning a value to each gamestate. Something you ideally want to learn, of course.
Could there be a reason why you can't just backprop a backwards annealing function that assigns high value to a winning state and progressively less value to the preceding states?

Would also be cool to have a generative LSTM model where you simulate a few steps ahead and base your next decision on that.

Anyway, thanks for putting this on github, I will definitely fork it and play with it when I have some time!

hardmaru

unread,
Apr 6, 2015, 6:38:52 PM4/6/15
to conv...@googlegroups.com, dav...@gmail.com, hard...@gmail.com, laa...@gmail.com
Louis:  I like your idea of adding the no-more-than-3-touch rule.  It shouldn't be too difficult to modify the source code to incorporate this.  My guess is in the current training method, the agents will likely aim to send the ball over to the other side right away on the first bounce, and may not incorporate further strategy as the net is quite simple.  Good luck playing with it!

Andrej, thanks I'll take a look at Sutton's book, I found it to be available online, on the author's site

hardmaru

unread,
Apr 20, 2015, 4:47:20 AM4/20/15
to conv...@googlegroups.com, dav...@gmail.com, hard...@gmail.com

I noticed Andrej implemented a packaged up "ReinforceJS" that seems to make it easy to incorporate learning tasks! Another toy to poke around. Thx!

David

On Friday, April 3, 2015 at 3:07:47 AM UTC+9, Andrej Karpathy wrote:
> wow, i LOVE the pendulum demo too. I will almost definitely play with getting DQN to work on these, or maybe try some policy gradient stuff. It's sligthly finicky and fragile, one has to know how the alg works and be able to debug it and play with all of its hyperparameters. I'll give it a shot, thanks for the links!
>
>
> """With proper physics simulation possible in web browsers, combined with browser-run neural networks, the sky's the limit! =)""" +1.
>
>
>
>
>
>
>
> On Thu, Apr 2, 2015 at 6:28 AM, hardmaru <dav...@gmail.com> wrote:
>
> Glad you enjoyed the game, Andrej!
>
> Yeah, I gotta take a closer look at DQN again.  Perhaps you or someone can take up the challenge to create a DQN-based slime volleyball agent and battle it out with the existing neuroevolved AI and see who is ultimately better :)  I github'ed the source code to neural slime volleyball in case anyone is interested, apologies in advance that it isn't the cleanest code in the world as it was intended to be more of a quick sketch rather than a program.
>
>
>
> Actually, the truth is when I first started looking at this stuff, I tried to use the DQN that is already implemented in convnetjs to perform tasks such as balancing an inverted cart-pole pendulum model which are toy problems in control systems, but this seemed quite difficult to get it to work due to the large amount of continuous states.  I gave up after a few weeks of trying unfortunately, but luckily stumbled across some papers on genetic algorithms and neuroevolution techniques and eventually trained some convnetjs networks to be able to balance a even a double linked inverted pendulum system.  After I got that working, I simply continued to use the ga-based methods on game AI, but you are probably right, DQN will probably work just as well in this game.
>
>
> If you are curious, please check out your convnetjs in action when used with a _proper_ physics engine (box2d transcompiled to JS via emscripten), in this double inverted pendulum self balancing demo (details explained in blog post).  With proper physics simulation possible in web browsers, combined with browser-run neural networks, the sky's the limit! =)
>
>
>
> Happy long weekend (over there)!
>
>
>
> Dave
>
> On Thursday, April 2, 2015 at 9:23:46 AM UTC+9, Andrej Karpathy wrote:
> This is awesome thanks for the link and writeup :) 
>
>
> I think this could have been done very nicely with RL, without all this evolution mambo jambo, but it probably takes a little more effort :) In this case policy gradient methods would work nicely, or maybe even DQN that's already implemeted in ConvNetJS. I would have thought it should work okay on this kind of reactive task, with some amount of training. 
>
>
> Cheers!
> Andrej
>
>
>
>
> On Wed, Apr 1, 2015 at 3:59 PM, hardmaru <dav...@gmail.com> wrote:
>
>
>
>
>
> I've been playing around with this library for a while and I found the code and demos incredibly useful to learn about implementing neural networks into interactive web applications.  Thanks a lot for creating such a fantastic library!  My interests is more geared towards reinforcement learning, task performing agents and control systems, rather than computer vision which recently seems to be a more popular topic recently.  I was initially interested to apply Q-learning techniques to agent control problems, but I found that training to be difficult for problems with too many continuous states.  After a while, I decided to go for more direct policy search approaches, and ended up writing some simple and conventional neural evolution trainer addins for training convnetjs neural nets.
>
>
> Anyways if you have some time or want to take a break from your research, check out the slime volleyball game demo I made using convnetjs.  It is basically a HTML5-js clone of the classic slime volleyball java applet game that was popular a decade ago.
>
Reply all
Reply to author
Forward
0 new messages