Research update: Dominance hierarchies paper accepted to ALA and COINE workshops

12 views

Skip to first unread message

Ram Rachum

unread,

Mar 31, 2024, 1:08:44 PM3/31/24

to ram-rachum-res...@googlegroups.com

Hi everyone!

Retrospective on goals for last month

In last month's update I outlined a list of goals. Here's my update on these goals:

Give the dominance hierarchies talk in different groups: ✅ Done
I've given the talk about the dominance hierarchies research at MALS, FLAIR and the UMD MARL seminar. I got mixed reactions. Some people were very excited about the work and asked lots of questions, for example, on whether I think dominance hierarchies would form in an environment that they describe. I also got some criticism that it isn't clear why this research is useful, which I understand, as it's mostly a synthesis paper between two fields.
Apply for the CAIF grant: ✅ Done
This was a lot of work, but I finished writing the application for the CAIF grant. This was a very structured grant. I wrote a 4 page research proposal, a project plan, a budget, and answered lots of questions about our proposed research. I wrote both the research proposal and the project plan in Latex to get that extra oomph. It felt like writing a paper.
Here are the title and abstract of our research proposal:
Opponent Shaping for AI Interpretability and Corrigibility
We are working on a solution for AI Interpretability and Corrigibility. Our insight is that a team of humans working on a task will be more transparent and amenable to changes than a single human working on that same task, thanks to the social interactions between the humans.
We plan to do the following: (a) identify the useful parts of the intricate social dynamics that humans use; (b) replicate these social behaviors in Reinforcement Learning (RL) agents; (c) equip these RL agents with multimodal foundation models (MFMs) to allow carrying out real-world tasks; and finally, (d) package the group of agents as a single AI system that accepts user prompts and provides responses. We conjecture that by observing the social dynamics between the agents, human operators could understand why the group as a whole makes the decisions that it does, and by promoting and demoting agents, human operators could change those decisions.
We predict that the Opponent Shaping (OS) paradigm will be the decisive factor in the success of step (b), and we plan to train our agents using the Model-Free Opponent Shaping (M-FOS) algorithm. Our first challenge would be to design environment rules that incentivize these agents to show social behavior that is conducive to interpretability.
I also submitted the same grant proposal to the Nonlinear grant, which is actually a multi-grant that goes out to 50 funders.
There isn't a predetermined notification date on any of these grants, so I'll just have to expect an update in the next few months, I hope.

Our dominance hierarchies paper was accepted to the ALA and COINE workshops!

I submitted our dominance hierarchies paper to both the ALA workshop and the COINE workshop, hoping that one of them will accept it. These are both workshops at AAMAS 2024, happening in the two days before the core conference. I was pleasantly surprised when I got the emails that the paper was accepted to both of them! Now I'll get to do two oral presentations, which I really hope would be on separate days.

I also got valuable feedback from the reviewers of both ALA and COINE, and I used that to improve the paper. I then submitted camera-ready versions for both workshops.

The ALA workshop sent an update that they'll be doing a poster session, so I'll bring my poster there. They require a smaller poster size than AAMAS: A1 instead of A0. I'm just going to print both sizes so I'll be ready for anything. This is our poster.

My goals for this month

Prepare a short talk for the ALA and COINE workshops.
One challenge I'm facing regarding the ALA and COINE workshops is that they require much shorter talks than the 50 minutes talk I've been giving so far. The ALA talk should be 5 minutes long, while the length of the COINE talk wasn't announced yet. I'm working on a shorter slide deck and I'll do a few practice runs of these short talks.
Learn how to run Opponent Shaping algorithms.
Finally I'm out of the slog. I'm going to dive into one of the opponent shaping libraries, possibly Pax, and learn how to use it. Because I've never used Opponent Shaping before, I'll take some time to play with it before I attempt to build my environment.

That's it for now. See you next month!

Ram.

Reply all

Reply to author

Forward

0 new messages