Hi everyone!
In last month's update I outlined a list of goals. Here's my update on these goals:
Make progress with POLA experiments: ✅ Done
I've worked so much on POLA! More details below.
Fundraising-related stuff: ✅ Almost done
I teased two secret fundraising-related tasks last month.
The first task is something that I should have done a year ago. I registered as a sole proprietorship with the Israeli tax authorities. This means that now when I get grant money, I can write my research expenses off that money instead of paying high tax. I've really been dreading getting this arrangement because it's a considerable hassle, but it seems like the smart thing to do. It also means I now have to pay an accountant every month, so I do hope that grant money comes along.
The second task I've done is still a secret. It's about 80% done so I'll report about it next month.
I've been running lots of experiments with POLA and making many improvements to its infrastructure.
I generalized the IPD environment to work for all matrix game social dilemmas. This means I can run such dilemmas as Stag Hunt, Chicken, Bach or Stravinsky, etc.
I generalized the POLA algorithm to N agents. Appendix A.9 of the POLA paper had most of the information needed to do that.
I created two different ways of running N-agent social dilemmas: free-for-all and pairwise. In free-for-all, all agents are playing against all of the other agents at the same time. For example, if there are 4 agents, then each agent has 3 observations from each of the 3 opponents, and its action is composed of 3 sub-actions, one for each opponent. In pairwise mode, the agents are paired randomly in each episode, like Chicken Coop, but without knowing who they're paired with.
I created a script for preparing a GCP instance to run my experiments. Now I can run experiments on GPUs with little friction. If you happen to have GCP credits you can spare, please let me know!
I contributed some code to Jax. I also opened an issue for a segfault I'm getting, which I hope will get some attention soon.
I'm experimenting with a way to analyze agent behavior. It involves creating many different neural networks that are close to the agent's neural network, but a tiny bit different, and then measuring the differences in metrics between them. Not sure whether it's useful yet.
I wrote a test suite, and I set up GitHub Actions as a CI to run it. Now I get notified whenever I break the tests.
I've set up JAX cache on Google Cloud Storage. Because POLA uses JAX, which just-in-time compiles code for GPUs, the start of each experiment can be slow, as the compilation step can take between 5 minutes and 90 minutes, depending on the size of the experiment. JAX has a caching feature which reduces the compilation time to about 30%. At first I enabled the cache, but then I figured I need the cache folder to be a persistent network folder, so it will be preserved between runs on the CI and the GCP instances. Now all of my machines share the same JAX cache folder as a GCS bucket. I mount it using rclone.
I've broken down the cooperation rate and reciprocity metrics to be per-opponent. Now when I measure an agent's reciprocity, I can get separate metrics for its reciprocity to the first agent, second agent, etc., in addition to having the mean reciprocity metric.
Let's talk about some experiments. I don't have yet an experiment that I'd write a paper about, but here's one interesting result. These are the rewards of 3 POLA agents playing IPD against each other simultaneously:
Their cooperation rates:
And their reciprocities:
I showed this to one of my collaborators and he asked "Why does agent 0 (green) have more reward even though it cooperates more than the other agents? Seems like that shouldn’t happen."
The cooperation rate cooperation_rate.0.coeval
is measured across interaction with all other agents. Which means that for each agent, its cooperation level is an average of its cooperation level with each of the other agents, cooperation_rate.0.1
and cooperation_rate.0.2
. Let's plot cooperation_rate.2.{0,1,coeval}
:
The yellow line is cooperation_rate.2.coeval
, and it's an average of the two other lines. The corresponding plot for agent 1 looks the same. So basically, both agents 1 and 2 have a high cooperation with agent 0, but a low cooperation with each other which is why they both get a low score while agent 0 gets a high score. This "love triangle" is emergent. There is no difference between the agents at the start besides the random initial values.
One question you might ask is "how come agents 1 and 2 don't start cooperating with each other more to get more points?" I'm currently working on a way to analyze the agents' behavior so I could answer such questions.
Apply to more funding opportunities.
I've been postponing applying for funding because I was busy with other things, but now I gotta do it. There aren't great prospects right now, but I did get the names of 2-3 potential funders. Some of them don't have an application process, so I'll need to cold-email them.
More POLA work.
Here are some items I want to do:
Figure out the segmentation fault I get with JAX. I hope one of the JAX maintainers could help me with it.
Use my fuzzy analyzing method to understand why agents sometimes don't do what I expect them to do.
Design an environment where POLA agents show any kind of interesting social behavior.
That's it for now. See you next month!
Ram.