Research update: Social order, first arXiv paper

13 views

Skip to first unread message

Ram Rachum

unread,

Apr 29, 2023, 5:37:39 AM4/29/23

to ram-rachum-research-announce

Hi everyone!

Here's my research update for this month.

Retrospective on goals for last month

In last month's update I outlined a list of goals. Here's my update on these goals:

Release the Stubborn environment as open-source: ✅ Done
The Stubborn code is now available as open-source on GitHub: https://github.com/cool-RR/stubborn
You can now run the training locally and generate the plots that I've included in the paper. If you follow that link, you'll see that it goes to arXiv. This is the first time I have a paper on arXiv! It's exciting. Immediately after it was posted I've received four invitations to submit my paper to predatory journals, which I am told is a rite of passage for researchers. One of them addressed me as "Dear Dr. Ram Rachum" 🫠
I also made a section for my papers on the front page of my research site. I hope to populate this section with a conference paper in the next 6 months 🤞
Read some of the cultural evolution papers: 🥲 Partially done.
I was supposed to read these 2 papers:
- Henrich: Five Misunderstandings About Cultural Evolution
- DeepMind Cultural General Intelligence team: Learning Robust Real-Time Cultural Transmission without Human Data (Blog post)
We picked these papers because they're part of the cultural evolution agenda, and my next paper should probably be part of that as well. However, I couldn't connect to these papers. (Meme.) I abandoned the first one after a few pages, and the second one after reading half. I hope that the issue is that they're to advanced for me, and I should be looking for something simpler.

Stuff I've done beyond the monthly goalsInteresting chicken experiments

In the update I sent three months ago, I talked about a few experiments I ran using the Chicken social dilemma. I've spent the last couple of months running more experiments in this direction. This is now the most interesting direction I'm exploring. I'll show you what I've got.

Recap: I've got 6 agents. In each episode, they're divided to random pairs and they play one turn of chicken with each other. Each agent observes the number of the agent that it's playing with (a number between 0 and 5). Each agent chooses a move, either "hawk" or "dove". We calculate the rewards according to the rules of chicken, and each agent gets its reward.

An emergent social order forms. This is a plot of hawkishness as a function of training generation:

Let's note a few interesting things here. (Not all of these are evident from the plot above, so you'll have to take my work for it.)

The agents have very quickly arranged themselves into a strict social order. Each pair of agents decided which one of them will always play hawk, and the other one always plays dove.
That social order is transitive. If agent X plays hawk to agent Y which plays hawk to agent Z, then agent X plays hawk to agent Z.
That social order just happens to have a strong correlation to the agent number. In this specific run, the correlation is a strong negative, i.e. agent 0 is on top. In other runs, it's a strong positive, i.e. agent 5 is on top and agent 0 is on the bottom. Sometimes the correlation is weak. But on average, the correlation is strong.
The agents establish this social order and maintain it even though they have no way to communicate with each other, they don't observe the moves that agents outside their given pair are playing, and none of them even know what their own number is!

I've also experimented with a few variations of this experiment. This is what happens when I increase the learning rate:

The agents have dominance fights. For example, agent 5 starts out on top. It loses its throne to agent 4, which enjoys a brief reign until agent 1 retakes the lead and keeps it until the end of the experiment.

There is an interesting mix of stability and instability in these results. On one hand some of the agents travel up and down the social order in a seemingly haphazard way. On the other hand, whenever they move to a different position, they don't swerve too much from it unless they move again to a different position.

When I showed these experiments to various researchers, especially the bit about the correlation between agent number and social position, they correctly guessed this happens because the agents observe their opponent's agent number in a single number neuron. If the agent number was expressed using one-hot encoding, i.e. a sequence of binary neurons, then the agents wouldn't be able to infer a natural order. I tried that:

This is interesting. I notice 3 things:

The agents take more time to learn a social order.
The social order no longer has a correlation with the agent number.
The social order is no longer transitive. Agent 3 is the top dog, playing hawk to everyone. Agent 2 plays hawk to everyone except agents 3 and 4. Agent 4 plays hawk to everyone except agents 3 and 5. This means there's a cycle between agents 5, 4 and 2. There's probably at least one other cycle in that run.

I ran another very interesting variation of this experiment, but I'll save that for another time.

My goals for this month

Prepare the Stubborn talk for the RaD-AI workshop.
I'm doing a short talk about my Stubborn paper at the RaD-AI workshop. This will be my second time presenting my work in a formal academic setting, and my first time doing it in person. I need to prepare the talk and try to anticipate questions that I might get.
While I'm not very excited about the Stubborn experiments anymore, I do want to give my listeners a good time. I think I can make my talk enticing.
Figure out the strategy for my first full-length paper.
I'm happy with my chicken experiments, and it's possible that they're interesting enough to write my first full-length paper. This would be a big milestone for me as a researcher, and I've been spending lots of thought on how to build that first paper.
Here are some questions I should be answering:
1. Should I add more parts to the experiment, or are the variations I made enough?
2. What narrative should I build for my paper? It could be "here are a couple of cool experiments I ran" or "I have proven that phenomenon X happens under conditions A, B and C" or anything else.
3. What conference will I target? AAAI 2023? AAMAS 2024? Maybe put it on arXiv and think about a conference later? This has big effects on the narrative, formatting and timeline.
4. Out of the 8 or so researchers that I've been working with, what subset will be coauthors on this paper?
5. Do I provide my code as open-source, and if so, how should I architect it?

That's it for now. See you next month!

Ram.

Reply all

Reply to author

Forward

0 new messages