Research update: Recovering, M-FOS, JaxMARL

14 views

Skip to first unread message

Ram Rachum

unread,

Nov 28, 2023, 5:08:01 AM11/28/23

to ram-rachum-res...@googlegroups.com

Hi everyone!

Retrospective on goals for last month

In last month's update I outlined a list of goals. Here's my update on these goals:

Recover: ✅ Done... mostly
This month has been frustrating, and I knew it was gonna be this way when I started. I postponed a lot of personal errands when I was working on the dominance hierarchies paper, and they've piled up. Now I've finished most of them, but it's been so frustrating, working every day on bullshit that doesn't move me forward. It's not completely over yet, but I think that in one week I'll be done with that.
Look into LOLA and similar algorithms: 😒 So-so
More details about that below.

Dominance hierarchies talk

I've been working on my talk about the dominance hierarchies research. I'm starting by planning a 50 minute talk, which I'll hopefully give in various reading groups, and then I'll boil it down to a 15 minute version that I'll hopefully give at AAMAS.

If you're curious, you can check out the deck.

Funding

I know I haven't updated about the funding situation in a while. I'm a little weary about updating on funding, because there's lots of heartbreak when a funding opportunity falls through. Right now I'm in talks with a potential funder, and I hope they'll be interested.

My goals for this month

Learn Pax and/or JaxMARL and attempt to run experiments.
As I was getting into the LOLA ecosphere, I've been playing around with the PureJaxRL and Pax frameworks. Last week, a new and exciting framework called JaxMARL was released. (Blog post.)
Let's explain what they are and what the differences between them are. All of these libraries are Jax-based RL frameworks. Jax is a framework that lets you write a Python function and then automatically differentiate it, which is an important part of running a neural network. Jax, with Flax on top, can be used instead of TensorFlow and PyTorch. The main caveat with Jax is that you have to write your function in a particular way for Jax to process it. You can think of it as a limited subset of Python. Once you do, Jax just-in-time compiles your function.
PureJaxRL is a Jax-based framework for single-agent RL. Pax is for multi-agent RL, using opponent shaping algorithms like M-FOS, which I hope might be the best algorithm for my research. JaxMARL is for multi-agent RL, and it seems to be built better than Pax, but it doesn't support M-FOS, as far as I know.
The attractive thing about these libraries is how fast they are, especially JaxMARL. On Twitter, Chris Lu claims that JaxMARL provides speedups of up to 12,500x. The secret sauce is that the environment rules are computed on the GPU rather than the CPU. When I say "the environment rules", it means the logic of the world that the agents operate in. In simple games such as prisoner's dilemma, that logic is simple: if e.g. player 1 defected and player 2 cooperated, then player 1 gets 2 points and player 2 gets -3 points. For a more complicated environment like a gridworld, the rules are more complicated, as they have to calculate exactly which object or agent occupies each cell, including collision detection.
When these environment rules are written as a Jax-jittable function, which is challenging, their runtime can be speeded up. Not only are they offloaded to the CPU, but because they're calculated with matrix manipulations, they can be stacked. This means that one run of the step function can calculate a thousand parallel environment runs at the same time, which provides an additional speedup.
I'm excited about this development and I'm working on familiarizing myself with this framework. Shoutout to Chris Lu and Timon Willi for answering my questions about these frameworks.

Process AAMAS reviews for my dominance hierarchies paper.
In few days the rebuttal period for AAMAS 2024 will start, which means I'll get an email with 2-3 peer reviews about my dominance hierarchies paper, where each of these reviewers will respond to my paper with either a "strong reject", "weak reject", "weak accept" or "strong accept". Then they'll attach their detailed reviews. Basically, if there are more accepts than rejects, then my paper is likely to get accepted, and if not, then no.
I'm anxious because getting a conference paper in AAMAS would be a milestone for my career, and there's so much randomness involved in the process of getting reviewers who like my paper.
If my paper does get enough accepts, it means that I then have to change it according to the reviewers' comments, which can be an excruciating process. Then I'll need to follow up to the notification deadline, hope it gets accepted, prepare a 15 minute version of the talk for the conference, and start shopping for a 30-hour connection flight to New Zealand.
If it doesn't get accepted, then I'll probably post it on arXiv and consider submitting it to a different conference. This could mean a lot of work adapting the paper, because different conferences have different paper requirements.
Fingers crossed!

That's it for now. See you next month!

Bonus meme.

Ram.

Reply all

Reply to author

Forward

0 new messages