Hi everyone.
Merry Christmas and Happy Hanukkah!
Not a lot of updates this month. I was rejected from the MATS program, which is sad. I'm still in the running for GAIS.
The Cooperative AI Foundation (CAIF) announced a new round of grants, and I applied. Last year I applied for a grant from them and was rejected. I got some helpful feedback which I've implemented, and I hope this will improve my chances this time. One interesting thing that they did is publish a retrospective of things that needed improvement with the previous round of grants, and describe how they're changing the grant application structure.
Next week I'm going to attend the IDSAI conference in Israel to present my dominance hierarchies paper. It's a non-archival conference. I need to adapt my dominance hierarchies talk to their 20 minute time format. I hope people will like my talk.
Progress on the actual research has been slow, partially because I'm a little stuck, and partially because I've been working on these other things.
Here's one interesting thing I've done though. I added a metric which measures, for each agent, how likely it is to do any specific action as a condition of what its observation is. This metric ignores the actual behavior of the other agents. Let me show you how this metric is useful. Here is a plot which I may have shown before:

What we're seeing here is the cooperation rates of two agents who are playing iterated prisoner's dilemma with each other. In epochs 0-20, they learn to reciprocate, though not to cooperate yet. In epochs 20-70, they learn to reciprocate and cooperate. Then from epochs 70 onwards, agent 0 is performing ZD Extortion on agent 1. This means that agent 0 is consistently getting more reward at agent 1's expense. The oscillations happen because the more agent 0 tightens the noose around agent 1's neck, the more incentivized agent 1 is to stop cooperating with agent 0's extortion. The two agents oscillate their behavior in response to each other.
A problem with this plot is that it might give the impression that the difference between agents 0 and 1 is a quantitative one. After all, the shapes of the two plots are quite similar. However, the new metric I added shows that the difference between the two agents is qualitative:


The top plot shows agent 0's cooperation rate by observation, and the bottom plot shows agent 1's. The green lines show how the agents respond to the opponent's cooperation, and the red lines show how they respond to its defection.
Agent 0, which is the extorting agent, always responds to defection with defection. However, it doesn't always respond to cooperation with cooperation. Agent 1 is the opposite: It always responds to cooperation with cooperation, but it doesn't always respond to defection with defection.
I hope I'll have some more cool stuff to show you next month.
Merry Christmas and Happy Hanukkah, Ram.