Research into social behavior in Prisoner's Dilemma?

76 views
Skip to first unread message

Ram Rachum

unread,
Jan 7, 2021, 11:58:25 AM1/7/21
to Reinforcement Learning Mailing List

Hi everyone!

I've been working on research into reproducing social behavior using multi-agent reinforcement learning. My focus has been on a GridWorld-style game, but I was thinking that maybe a simpler Prisoner's Dilemma game could be a better approach. I tried to find existing research papers in this direction, but couldn't find any, so I'd like to describe what I'm looking for in case anyone here knows of such research.

I'm looking for research into scenarios where multiple RL agents are playing Iterated Prisoner's Dilemma with each other, and social behaviors emerge. Let me specify what I mean by "social behaviors." Most research I've seen into RL/IPD (example) focuses on how to achieve the ideal strategy, and how to get there the fastest, and what common archetypes of strategies emerge. That is all nice and well, but not what I'm interested in.

An agent executing a Tit-for-Tat strategy is giving positive reinforcement to the other player for "good" behavior, and negative reinforcement for "bad" behavior. That is why it wins. My key point here is that this carrot-and-stick method is done individually rather than in groups. I want to see it evolve within a group.

I want to see an entire group of agents evolve to punish and reward other players according to how they behaved with the group. I believe that fascinating group dynamics could be observed in that scenario.

I programmed such a scenario a decade ago, but by writing an algorithm manually, not using neural networks-backed RL. Before I try to implement thatI want to know whether there are existing attempts.

Does anyone know whether such research exists?


Thanks for your help,
Ram Rachum.

Ram Rachum

unread,
Jan 7, 2021, 12:44:57 PM1/7/21
to Sandy Tanwisuth, rl-...@googlegroups.com
I read the abstract now and it sounds perfect! Thank you Sandy, I'll give that a read.

If you or anyone else is familiar with more such research, please send it in this thread.

On Thu, Jan 7, 2021 at 7:30 PM Sandy Tanwisuth <k.tan...@gmail.com> wrote:
Hi Ram,

If I understand you correctly, you might want to check out Neurips 2020 paper from Bowen Baker.



--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en
---
You received this message because you are subscribed to the Google Groups "Reinforcement Learning Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rl-list+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rl-list/5479a088-0d8b-48b0-bde1-1406791a86dan%40googlegroups.com.

Mirco Musolesi

unread,
Jan 7, 2021, 5:12:53 PM1/7/21
to rl-...@googlegroups.com, Sandy Tanwisuth

 

Dear Ram,

 

You might be interested in our AAAI 2020 paper:

 

Nicolas Anastassacos, Stephen Hailes and Mirco Musolesi. Partner Selection for the Emergence of Cooperation in Multi-Agent Systems using Reinforcement Learning. In AAAI 2020. New York City, NY, USA. February 2020.

 

 

Mirco

 



 

 

 

 

 

--

Mirco Musolesi

W: https://www.mircomusolesi.org

 

 

Ram Rachum

unread,
Jan 7, 2021, 5:12:56 PM1/7/21
to Sandy Tanwisuth, rl-...@googlegroups.com
I skimmed it now, and it looks like a step in the right direction, but he does gently coerce the agents into teams. There's some combination of "soft teams" and "hard teams", that share a reward between themselves. Gradually the agents learn to cooperate with the other agents on their teams. There is also a random noise element. In general I felt there was too much mechanics, that make it difficult to understand the simplest essence of cooperation.

It's impressive progress over other papers I've seen, but what I'd really like to see is teams forming with even less "training wheels". If anyone knows of something like that, please send it. I'm starting to think that might be my next project after all :)

Mirco Musolesi

unread,
Jan 7, 2021, 5:41:20 PM1/7/21
to rl-...@googlegroups.com, Sandy Tanwisuth

 

 

>Nicolas Anastassacos, Stephen Hailes and Mirco Musolesi. Partner Selection for the Emergence of Cooperation in Multi-Agent Systems using Reinforcement Learning. In AAAI 2020. New York City, NY, USA. February 2020.

 

Sent too quickly. Link to the paper: https://www.mircomusolesi.org/papers/aaai20.pdf

 

Best,

Ram Rachum

unread,
Jan 8, 2021, 10:54:59 AM1/8/21
to rl-...@googlegroups.com, Mirco Musolesi
I printed it and I'll give it a read. Thank you! 

You received this message because you are subscribed to a topic in the Google Groups "Reinforcement Learning Mailing List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rl-list/k9jNdOk6NZk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rl-list+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rl-list/PA4PR05MB7582157EAE251F5CDC5CCEE9FDAF0%40PA4PR05MB7582.eurprd05.prod.outlook.com.

Julian Lopez Baasch

unread,
Jan 8, 2021, 12:29:58 PM1/8/21
to rl-...@googlegroups.com, Mirco Musolesi
Hi Ram,

You should check the (great) work done by Deepmind on MARL and game theory.



--

Julian Lopez Baasch

unread,
Jan 8, 2021, 12:30:01 PM1/8/21
to rl-...@googlegroups.com, Mirco Musolesi
Here's an introductory post to their work.
Bets

Ram Rachum

unread,
Jan 8, 2021, 2:24:17 PM1/8/21
to rl-...@googlegroups.com
My cup runneth over. I'll check it out. Thank you Julian.

Ram Rachum

unread,
Jan 12, 2021, 7:18:58 PM1/12/21
to rl-...@googlegroups.com, Mirco Musolesi
Hi Mirco,

I finished reading your paper. It's great. I've done a similar simulation in the past, with the different populations changing sizes in phases, but this is better. I'll likely use some variant of your partner selection algorithm.

I will note that in your paper, there is cooperation between each pair of players in an active game, but not group dynamics. What I'm interested in is that a set of players will learn to punish and reward other players as a group, not as individuals.

Are you aware of any research that explores that?


Thanks,
Ram.
Reply all
Reply to author
Forward
0 new messages