Research update: Ad-hoc teamwork, experiments, EuroPython

33 views

Skip to first unread message

Ram Rachum

unread,

Jun 30, 2022, 3:29:47 AM6/30/22

to ram-rachum-res...@googlegroups.com

Hi everyone!

Here's my research update for July.

Retrospective on goals for June

In the June update I outlined a list of goals. Here's my update on these goals:

Experiment with PettingZoo and RLlib: ✅ Done, but just PettingZoo
I spent the last few weeks learning PettingZoo and experimenting with it. It's frustrating for me, because the standard for user-friendliness in scientific computing is much lower than the standard for other open-source packages (such as Django, Requests, Click, etc.) Of all the existing MARL frameworks, PettingZoo seems to be the least bad. The main person responsible for PettingZoo is Jordan Terry. He's done a good job of building a healthy community of contributors, including Ben Black and Tai Jun Jet. They've all helped me on Discord when I had questions about PettingZoo.
I've run a few experiments with PettingZoo and Stable Baselines 3 and I now have a better idea of what I can achieve with them. One limitation that bothers me is that these tools don't support the agents having individual brains. In other words, if you're training agents on a game with 10 players, all 10 players will be played by the same neural network, like it's the same person. I don't like that. I want agents to develop their own distinct personalities, so I could examine social behaviors around their differences.
To do that, it looks like I'll need to take code from CleanRL and modify it. Unlike SB3, CleanRL can't be used modularly, i.e you can't do from cleanrl import PPO. I'll have to roll up my sleeves and adapt that code to support multiple brains. CleanRL's author, Costa Huang, has been helping me understand how the library works.
Start meeting more researchers regularly: ✅ Done
I've now got more researchers that I'm meeting regularly with. I won't give out names to respect their privacy, but I've got enough people to feel like I always have someone to bounce ideas off of.

Stuff I've done beyond the goals

My paper was accepted to the Workshop on Ad Hoc Teamwork!

This is a big step for me. I wrote a paper and it was accepted to the Workshop on Ad Hoc Teamwork (WAHT) .

I should explain what a workshop is. To understand workshops, you first need to understand journals and conferences:

Journals: In the academic world, the success of researchers is determined by their publications in journals. There are thousands of academic journals of varying reputations. The most high-regarded journals are Nature, Science and The New England Journal of Medicine. A researcher's performance is evaluated by how good of a journal they can get their paper accepted into. The process for reviewing papers is long and arduous, but it's one of the pillars of modern science. This excellent video explains more about that.
Conferences: In the world of Computer Science, conferences replaced journals. The big three AI conferences are ICLR, ICML and NeurIPS. They function very similarly to journals, except they're a physical conference where people fly to some convention center to listen to other researchers give talks about their research. The review process is still long and arduous, and after everyone flies back home the papers get published in a journal that's called something like Proceedings of the 35th International Conference on Machine Learning.
Workshops: Every conference has a set of workshops which happen either just before the core conference days or just after them. Workshops are similar to the core conference in that researchers need to get their papers accepted to the workshop, undergo a review process, and then present that paper in front of all the other attendees. Workshops are different from conferences in a paradoxical way. They can be said to be less serious, because the barrier to entry is lower, and the papers can be shorter and less rigorous. However, since the papers that get presented in the core conference have undergone such a time-consuming review process, the interesting results are well known in the community before the conference even starts. Therefore, the cutting edge research will usually be found in the workshops rather than the core conference.

My paper is called Fruit Slots: Experimenting with an Autocurriculum of Implicit Communication in Reinforcement Learning Agents and it describes a series of experiments I designed a year ago. Reuth Mirsky encouraged me to write and submit it, and helped me to understand what's expected of me.

I describe three experiments in that paper, and I already ran the first two successfully! I need to run the third, and I also need to work on how I present the results, and make them relevant to the focus of the workshop.

I'm very excited about this. In a way it's my first official step into the world of research.

I was accepted to HAAISS 2022!

Another happy acceptance! Neike Taika-Tessaro, who is a member of this mailing list, let me know that there's a three-day event called Human Aligned AI Summer School happening in Prague in the beginning of August. I looked at the website and it seemed cool. I signed up and a couple of weeks later they let me know I was accepted!

This is a great opportunity for me to get exposed to relevant research and get connections with more researchers.

I ran an interesting experiment called StayAgray

This modest experiment is a precursor to a more complex experiment that I hope to run in the future.

Here is the output, and here's what it means:

Every line is one state of the game, so time progresses in the downward direction.
There are three agents: A, B and C.
Agent A wants to stay as far away as possible from B and C. The farther A is from its nearest neighbor, the more reward it gets. It uses a learning algorithm called PPO to improve its behavior.
Agents B and C move completely randomly. Agent B moves only to the right, and Agent C moves in both directions. They don't have a learning algorithm. This is actually single-agent reinforcement learning, because only Agent A is "sentient".
The playing area is drawn as a line but it's actually a circle. In the very first turn, you can see agent A walking left and emerging on the right side, just like in Pacman.
The plus signs on the right side say how much reward agent A got on that turn.

The interesting thing about this game is that agent A will naturally try to stay in the middle between agents B and C, but at some point they'll move close together and squeeze A in between them. Agent A needs to make the brave decision to get close to one of them and emerge on the other side. The difficulty in making this decision is that it has to go through a "reward valley", i.e. sustain low reward for the hope of a big future reward.

You can see this behavior happening in turns 23-29, and then again in turns 100-107. This is beautiful to see. It's not cooperation yet, but it feels really human when agent A crosses bravely over agent B to get to the other side.

After I have more progress with the other stuff, I want to run a multi-agent version of this experiment. I'm really interested in a scenario where all agents are constantly feeling each other out and responding to them. I hope this can happen in 4-6 months from now.

My goals for July

Prepare two talks for EuroPython and give them.
I'm going to fly out to Dublin to give a talk at EuroPython about my open-source project PySnooper from 2019. By itself, this has nothing to do with my MARL research; but I'm going to take advantage of that opportunity to give a lightning talk about my research. I hope to reach a lot of people.
After two years of giving talks only through Zoom and GVC, I'm excited about giving a talk in real life. I think I'll have around 500 people attending my talks live. I'm so excited about that! I'm also scared. I'm good at public speaking, but I've done this in front of such a big audience only twice in my life. I really enjoy it... But just thinking about the stage lights shining in my eyes makes me dizzy.
The cherry on top? I'll need to shorten the research talk from 45 minutes to 5 minutes. On one hand it's insanely difficult, but on the other hand it's thrilling. To be fair, the audience's expectations are adjusted to this. I can cut out the carefully crafted arguments and caveats and focus on a clear, simple story with impressive demos.
Have fun in Dublin.
I've never been to Dublin before, and I'm gonna take a week to have fun, do touristy things and meet new people.
If you're in Dublin and want to hang out, hit me up. I'll be there from July 10th to July 18th.
Prepare my work on Fruit Slots for presenting at WAHT and present it.
As I wrote above, the first two Fruit Slots experiments work, but I'll still need to do a lot of preparation work before I can present them at WAHT. I've read Stone 2010 which is the basis for this workshop, and I wrote some notes about it.