Hi everyone!
I'll start with the sad news: I was rejected from the CAIF funding round. I was really hoping this would work out, and now I'll need to find some other funding after my time in Berkeley. They did provide detailed feedback which would be helpful next time I'll apply.
I'm now waiting at the TLV airport for the plane to Rome. There are mass fires around Jerusalem, so I was worried that my flight would be cancelled, but fortunately it's not. No casualties in the fires thankfully. In Rome I'll have exactly one hour and ten minutes to run to board my flight to San Francisco (assuming my first leg isn't late...) The last month has been so stressful with preparations for this trip. The big items were getting my visa, subletting my apartment in Tel Aviv, and closing on affordable apartment leases in Berkeley. I did eventually get my visa, but way later than I thought I'd get it, so I had to rush all the other things. After a lot of effort I got it all done. (Berkeley real estate meme.)
I've been working a lot with the AdAlign algorithm, and I think it's right for my research. Unlike my previous algorithm, it works well with hidden layers. Juan, Milad and Razvan helped me find a problem with my implementation by suggesting I plot the advantages. I've been improving my implementation and I think it's mostly correct now.
One thing that was important for me was to make sure that the algorithm was "morally neutral". What I mean by this is that while AdAlign is able to learn positive behaviors such as cooperation, I want to be sure that it learns these not because the algorithm is directly encouraging cooperation, but because for the specific game it's playing, it's reaching the conclusion that cooperation is a good strategy. In practice this usually means that cooperation is a result of reciprocity.
I designed a series of experiments that I call the "prosociality litmus test". The idea is to show that AdAlign learns reciprocity and cooperation in IPD, but then when we make reciprocity impossible by removing the observations, the agents stop cooperating.
In experiment 1, we let 2 AdAlign agents play IPD against each other while setting the beta hyperparameter to zero. This effectively makes AdAlign not do any opponent shaping, and behave like a conventional policy gradient algorithm. Expectedly, the agents immediately learn to defect forever:
Here are the reciprocity levels for the two agents:
The agents are not cooperating and not reciprocating.
In experiment 2, we set the beta hyperparameter to 0.5, which means that the algorithm is actually doing opponent shaping.
Cooperation rates:
Reciprocity levels:
The agents cooperate and reciprocate. The oscillating pattern is interesting, and though it makes intuitive sense to me, I can't exactly understand why they'd go up and down like that. Suggestions welcome.
Experiment 3 is like experiment 2, except that we strip agents of their observations. They can't see what action their opponent just played. This means they have no way of reciprocating, i.e. of playing cooperate only if their opponent plays cooperate.
Cooperation rates:
Reciprocity levels:
As expected, the agents can't reciprocate without an observation, and therefore don't cooperate. This provides some evidence that AdAlign doesn't have a bias towards cooperation.
I'm looking forward to the internship. It seems there are a few other people there who are excited about opponent shaping algorithms, so I hope we could help each other.
See you next month,
Ram.