Research update: FLI blog post

15 views

Skip to first unread message

Ram Rachum

unread,

Sep 1, 2024, 8:16:35 AM9/1/24

to ram-rachum-res...@googlegroups.com

Hi everyone!

Retrospective on goals for last month

In last month's update I outlined a list of goals. Here's my update on these goals:

Apply to more funding opportunities: ✅ Done
I've applied to a few small funding opportunities (1517 Fund, Powoki, Ergo Impact), but there aren't any major ones right now. I decided I'll try to take on freelance work to support my research. This is something I very much wanted to avoid because it'll take away from my focus, but I need an injection of money, even if just for a few months.
If your company could use freelance work in Python, please reach out. CV / LinkedIn/ GitHub
I'm not giving up on looking for proper funding for my research. I'm still scouting for possible opportunities and applying for them.
More POLA work: A little frustrated 😕
I'm disappointed with my progress on POLA. It does feel like I'm doing meaningful work, but I expected to be closer to publishable results by this point.
I am now grappling with around four problems:
1. After POLA learns reciprocity in IPD, it doesn't converge, because the two agents learn to extort each other in an unstable way. I would like to find a gentle change to the environment that will result in the agents converging on a behavior, extortionate or otherwise. This will make POLA easier to study. However I haven't found such a change.
2. I've had the idea of using "fuzzy brains" as a tool to troubleshoot why the agents sometimes don't learn the behavior I'd expect them to learn. Fuzzy brains are slight variations of each agent's brain (network parameters) which I then evaluate to see which metrics correlate with which other metrics. However I haven't been able to actually use this method to clarify any results, and I'm not sure what I need to change in this method to make it actually useful.
3. I would like to come up with an environment in which POLA agents do something like forming teams. I've tried 5-10 different environments (depending on how you count variations in rules) but none have been successful. Most of these environments were variations of IPD.
4. The POLA compilation times are quite long, which is often a hindrance in running experiments. I have to wait between 10 minutes and 10 hours for an experiment to compile before it starts running. This is because of the somewhat non-standard usage of JAX in POLA. I've discussed how to improve this with Jake Vanderplas who is a very active JAX maintainer. He did some back-and-forth with me but I failed to improve my implementation.
I've been jumping back-and-forth between these four problems.

Why Tufts?

Last month I said that I've moved my affiliation from Bar-Ilan University to Tufts University, but didn't explain why. Now I can explain: My PI Reuth Mirsky has relocated to Boston and has taken a professor job at Tufts University. All of her students and I, a.k.a. the GOLD lab, will continue working with her remotely from Israel.

War

I haven't written a lot about the war, but if you're wondering, it's stressful. We've been on high alert for an Iranian attack since Haniyeh's assassination last month. I've stockpiled food, water and other necessities. I live in an old apartment building which has a shelter in the basement. (Newer buildings have a shelter room in each apartment.) If a siren is activated, everyone in the building will grab their stuff and run down to the shelter.

Last Sunday I celebrated my 38th birthday, and when I woke up in the morning I heard that a Hezbollah missile attack on Tel Aviv was preempted while I slept. Life in Israel is surreal that way. I had a good birthday, but it's difficult to comprehend that it could all be over in a second.

FLI blog post

In last month's update I teased a little project I worked on that I hope will make it easier for me to raise funds. Now that it's published I can reveal it: I wrote a guest post on the Future of Life Institute's blog. The post is titled Can AI agents learn to be good?

I'd appreciate it if you could share this blog post on your feeds.

It's basically an intro to AI agents and the dangers they pose, and in the last section I lay out my research strategy and plug my dominance hierarchies paper.

This is my first public writing about AI Safety, and I'm very proud of it. FLI is a highly respected foundation, headed by Max Tegmark and Jaan Tallinn. FLI's staff reached out and invited me to write that post, which is very flattering. When we finished the lengthy review process they told me that this content was exactly what they were hoping for in a guest post 😇

My goals for this month

Find freelance work to fund my research.
Make some progress on POLA. I'm not sure what that would look like at this point, but I'll keep trying.

That's it for now. See you next month!

Ram.

Reply all

Reply to author

Forward

0 new messages