Research update: Submitted my paper, war

Ram Rachum

Oct 30, 2023, 11:45:27 AM10/30/23

Hi everyone.

As you know, on October 7th Hamas invaded Israel. They murdered over a thousand civilians and abducted over two hundred more. We are now at war.

I want to thank everyone who contacted me to check up on me. Fortunately my family, friends and I are safe. I'm located in Tel Aviv which is relatively far from both the southern and northern fronts, so most of the danger here has been from rockets rather than in-person combat.

Beyond the immediate danger, we are also worried about escalation with Lebanon and Iran. We're thankful for the intervention of many countries, especially the US. We hope that this conflict deescalates as quickly as possible.

I'm personally lucky that I'm emotionally resilient to conflicts such as these. Life in Israel changed since the war. In a way, it's a little like the first few weeks of Covid we all experienced in 2020. Something big happened that stopped everything. No one can think about anything else. There's no more music or comedy. The stores, restaurants and bars are closed. And like the Covid period, I'm able to hole up in my little apartment and work on my things while I wait for the world to recover.

Retrospective on goals for last month

In last month's update I outlined a list of goals. Here's my update on these goals:

  1. Finish the dominance hierarchies paper and submit it to AAMAS:☑ Done

    It's finally over. I'm thankful that I was responsible with the timeframe and started working hard on the paper early. The deadline was on October 9th, and war broke out two days prior, on October 7th. I still had more work to do on the paper in those two days, but fortunately it was little enough that I was able to do it despite the pressure of war.

    There were many sirens in those two days. Sometimes I would write a sentence, the siren would go off, and me and my roommate would run off to take shelter. We'd hear two or three blasts. When one of the blasts was louder than usual, we'd try to convince ourselves that it's still far away from our apartment. One of these rockets hit a building 800m from our apartment.

    After 10 minutes I'd return to my computer and try to remember what I was trying to write before the siren.

    I wish I could share the paper with you now, after all this work, but because it's under review at AAMAS I'm not allowed to do so. In two months I'll get a notification from the reviewers, and then I could probably post it on arXiv and share it with you. I'm happy with how it turned out (meme). There are a few areas I wish I had the time and energy to improve. As the saying goes, "a paper is never finished, only abandoned."

  2. Apply for funding with several foundations: ☑ Mostly done

    One of the things I had to do before I could apply for funding is to restructure my CV, which was previously geared towards showing my engineering achievements. I had to compress the engineering part and populate the rest with my modest achievements as a researcher. I also had to make it less fun, because the research world is like that. If you've got any notes on it, let me know.

    I applied to two sources of funding. I'm waiting for an answer and hoping for the best. Meme.

Recovering from writing my paper

It was nice to be done with the paper. After being laser-focused on that one particular thing for so long, I can finally work on different things and especially blue-sky ideas. I reread Bakker 2021 which I think is a really insightful paper on emergent reciprocity. I ran a few experiments in iterated prisoner's dilemma.

After trying different things, I think the development I'm most excited about right now is LOLA, which is an algorithm developed by Jakob Foerster in 2018. I think that the LOLA approach might be the right solution for the problem of getting agents to reciprocate with each other. In the LOLA approach, agents don't just learn to maximize their own reward; they also learn to steer the learning process of the other agents into the direction that would yield them the most reward. I say "the LOLA approach" instead of just LOLA, because many people have written algorithms based on LOLA, like POLA, COLA, SOS and M-FOS. Supposedly these algorithms fix the inherent problems in LOLA.

My goals for this month

  1. Recover.

    I've got a feeling that this month isn't going to be very productive. The war is part of the the reason, but I've also put some stuff in my life on hold while I was writing the paper, and now I've accumulated some problems I need to fix.

  2. Look into LOLA and similar algorithms.

    As I started diving into LOLA and figuring out how it works, I realized that there are some basic things about reinforcement learning that I don't understand well enough, specifically baselines. I reread the beginning of the Sutton and Barto book, and when I got to the section about gradient bandit problems, I found the same things that puzzle me with the way that baselines work. I'll have to go over it again and again to understand it, and then I'll have to dive into LOLA and its friends again.

    As an aside, the new ChatGPT feature where you can attach a picture has been very helpful for me when I run into things I don't understand in textbooks.

    After I understand LOLA, I want to design an environment that elicits social behavior and let LOLA loose on it, hoping it'll exhibit that behavior.

That's it for now. I hope that next month will be relatively calm.


