Research update: Geekonomy, grants, red agents

Skip to first unread message

Ram Rachum

Oct 1, 2022, 6:42:00 AMOct 1
Hi everyone!

Here's my research update for this month.

Retrospective on goals for last month

In the last month's update I outlined a list of goals. Here's my update on these goals:

  1. Learn RLlib and figure out multiple brains: ✅ Done

    Oh boy. I've been postponing this so much, but finally I got it done. I'm still a novice, but I worked through this tutorial notebook a few times, did a lot of troubleshooting and debugging into RLlib's internals, ran some of my own multi-agent experiments, and also submitted a couple of bugs and pull requests to the RLlib project.

    I know I'm such a prima donna programmer, but so many of the scientific libraries that I need to touch are so icky to me. RLlib might be the best of the bunch in multi-agent reinforcement learning libraries, but it's still full of frustrations, weird design choices and log noise (examples, meme). I'm reminiscing about my days as a backend web developer using Django. Not that I miss web programming, but Django as an open-source project is a masterpiece. I got used to so much attention being given to the ease of use, documentation, support and ecosystem. None of the tools in the RL space come even close. Of course, this isn't a fair comparison because a web framework is a much more ubiquitous tool than an RL framework.

    Part of my goal was specifically about running experiments where agents have multiple brains (i.e. separate weights for their neural networks.) Fortunately this is one of the things that are easy to do with RLlib, so I already ran a few experiments where agents have multiple brains, detailed below.

  2. Run experiments with agents trying to move away from each other: ✅ Done

    This is the latest development in a series of experiments that I call "StayAway".

    Back in June I showed you a bare-bones version of this experiment. In that version, I had an agent that's trying to be far away from two other agents, but the other agents aren't really "alive", they're just doing a random walk.

    Now with RLlib I was able to run this experiment with all the agents being "alive", i.e. having an RL algorithm, and each agent having a separate brain.

    Here is a visualization of that experiment. Sorry for the crude graphics. Here's how to interpret them:

    1. There are six agents, numbered 0 to 6.

    2. Every line is one state of the game, so time progresses in the downward direction.

    3. Each batch of 100 lines is a separate episode of the game. After each episode, there's a text saying something like "Iteration=2: R("return")=26248.025 Sample game:" and then the next episode. Between each two episodes there's a learning process, so the top episode shows the agents before they learn anything, and the bottom episode shows the agents after they learned a lot.

    4. Each agent wants to stay as far away as possible from the other agents. Each agent gets a reward which is the distance to its nearest neighbor squared.

    5. The playing area is drawn as a line but it's actually a circle. This means that agents can walk into the right end and emerge on the left. This also means that an agent on the left end is considered very close to an agent on the right end.

    It's cool to see that at first, the agents are very clumsy, and they stay close to each other which loses them a lot of points. If you scroll down to around the 10th episode you can already see them becoming smarter, keeping careful distance from one another. If you scroll to the bottom of the file, you'll see the agents are now pros at this game. After an initial hustle, they space themselves on the circle almost perfectly.

    I want to continue developing this experiment. More details about that in the goals for next month.

  1. Apply to LTFF for funding: ✅ Done... But rejected 😢

    I've been torturing myself with the LTFF application questionnaire for a full week before I submitted it. I know that grant-writing is a part of a researcher's life that most researchers really dread, so at least I'm having an important career experience.

    I rewrote my application a few times, had friends review it and asked a few relevant people to be my references. I asked for a total of $45K for six months of research. On September 7th I submitted the application, and on September 21st I got an email saying my application was rejected.

    I hoped I could get some feedback from them, but the email said "Please note that we are unable to provide further feedback due to the high volume of applications."

    I don't despair. I have a list of foundations and funds that could appreciate my research, and I'm going to contact them one-by-one. I'm okay with being slow and sequential rather than fast and parallel. I'm not tight on the money yet. I think that treating my fundraising process as just one of the things I work on daily, alongside working on the actual research, would be healthier than spending most of my time on fundraising.

Stuff I've done beyond the goals

I was interviewed on Geekonomy! (Hebrew)

This section will be interesting mostly for the few dozen Israelis / Hebrew speakers in this group. A few weeks ago I was interviewed for the Geekonomy podcast! Geekonomy is one of the most well-regarded podcasts in Israel. They've interviewed parliament members, professors, CEOs and now me :)

This is my episode (Hebrew only) though you'll probably have a better experience listening to the podcast in your favorite podcast app, like Spotify or Pocket Casts.

I think I was too stressed when I was interviewed... I was focusing too much on giving answers to questions instead of having a fun conversation. I met the host (Reem Sherman) for the first time only 5 minutes before we started recording, and then I had to stay with my mouth on the microphone without moving too much, which is difficult. So I could have done it better, but listening to the podcast, I think it came out okay.

Because I'm a relatively low-caliber guest, they pitched my interview to the listeners as a warm-up act for the interview (English) with Marcus du Sautoy , which I take as a great compliment :)

My goals for this month

  1. Speak at AISIC and Reversim Summit.

    I made a couple of posts on the "Effective Altruism Israel" Facebook group, and I was contacted by David Manheim. David founded ALTER (The Association For Long Term Existence And Resilience), which is an Israeli foundation pushing for EA-related causes such as AI alignment and pandemic preparedness. David told me that he's organizing a conference called AI Safety Israel Conference, or AISIC, on October 19th-20th. If you're interested in AI safety, I hope that you join us.

    We agreed that I should do a poster session there about my research, which is a great opportunity for me to reach more people. Amusingly, the conference will be held at the Technion, which is the university I dropped out of 17 years ago. Good times :)

    This'll be my first poster session I've ever done. Poster sessions are a common activity for researchers. Which means it's another thing I'll need to learn. I'll look for threads (and memes) about having a successful poster session online, and ask my researcher friends for advice. This is my draft so far if you'd like to roast me :)

    I'm also going to give a talk at Reversim Summit 2022, Israel's top software engineering conference. It's going to be on October 24th-25h. This talk is not going to be about my research, but I'll tease my research and link to it. I hope a few people will be interested enough to join this group.

  1. StayAway experiments: Introduce blue agents that want to get close.

    In the section above I talked about the StayAway experiment, where I have six agents that try to get as far away from each other as possible. At some point they space out to equal distances, which is boring. I want to make things interesting.

    My next goal for these experiments is to add agents that want to get as close as possible to the other agents. Let's call the original class of agents that want to be as far away as possible from each other "red agents". The new class of agents we'll call "blue agents", and each blue agent gets points based on how close it is to its nearest red neighbor. Note that blue agents won't get any points from being close to each other, because the solution for that would be too simple.

    The interesting thing here is that the red and blue agents have conflicting goals. I'm stoked to find out what's going to happen once I pit them against each other. I have a few guesses, but I'll keep them to myself :)

  2. Continue fundraising.

    I found the next two funds that I want to apply to. I hope I'll have time this month to study their requirements and write my application.

That's it for now. See you next month!


Reply all
Reply to author
0 new messages