Research update: AGI Safety Fundamentals, Cultural evolution

Skip to first unread message

Ram Rachum

Feb 27, 2023, 10:17:32 AM2/27/23

Hi everyone!

Here's my research update for this month.

Retrospective on goals for last month

In last month's update I outlined a list of goals. Here's my update on these goals:

  1. Write a short paper for the RaD-AI workshop: ✅ Done

    This was such a pain to write. (Meme.) I initially liked my experiments in RaD-AI, but since I couldn't get the kind of results I wanted, and experiments in other directions became more interesting, I just wanted to be through with it.

    It took me a while but I wrote a 2-page paper about my experiment. I call this environment "Stubborn". I'm using it to quantify the stubbornness of agents and how they observe how stubborn the other agent is in order to aid their decision-making. I think it's a fun little paper:

    I'm still waiting to hear whether it'll be accepted to the workshop or not. If it will be, I'll need to open-source my code, which will be labor-intensive as well.

    I'm happy that I wrapped up this direction nicely. Now I can focus on the experiments that I'm more excited about.

  2. Work on convention-forming experiments with the chicken game: Ongoing

    I've been consulting with my researcher friends about the chicken line of experiments that I've been running. We all agree that the results are cool and somewhat novel, but they need an extra something to make them a paper worth publishing. The big question is: What is that something?

    One possible direction is cultural evolution. About a year ago I watched a talk by Edward Hughes from DeepMind about their research in cultural evolution. I remember that I was a bit skeptical when I saw it, because there's an agent that's following an "expert" which teaches it how to go through checkpoints in a specific order. What I was skeptical about is the fact that there is no social relationship or tension between the expert and the agent. If I understand correctly, the expert is a sort of inexplicable "hand of God" coming down from above and guiding the agents. I prefer the agents to establish a social relationship first.

    But, because this might be my research direction now, I should read that paper thoroughly, and a couple more papers too. This won't be a goal for next month because I've got too much work, but I'll probably do it the month after the next one.

Stuff I've done beyond the monthly goals

UMD MARL reading group

I found that the University of Maryland has a reading group on MARL, managed by Saptarashmi Bandyopadhyay. I wanted to join, but unfortunately they hold their weekly sessions on a time that doesn't work for the Israeli time zone. If you happen to be interested, feel free to join their group:

AGI Safety Fundamentals course

A few weeks ago I started the AGI Safety Fundamentals course. This is a course in AI Safety managed by BlueDot Impact, which is one of the many foundations in the Effective Altruism ecosystem. Since I'm trying to apply MARL to AI Safety, I should have a better understanding of the basics of that field.

The class is broken up into cohorts by time zones, and once a week I meet the six other people in my cohort to discuss this week's reading tasks.

If you'd like to apply to join this course on its next run, do so here.

My goals for this month

  1. Finish the AGI Safety Fundamentals course.

    I've done 3 meetings out of 8, so now I've got 5 more meetings to go to finish the AGISF course. I'm learning some of the AI Safety terms that so far were fuzzy for me, like goal misspecification, goal misgeneralization, outer vs inner alignment. This month I'll do 4 more meetings, which means I will almost finish the course.

  2. Give a talk about VisiData at PyWeb-IL.

    In last month's update, I told you about VisiData, the program I discovered for exploring the data from my results. I'm enjoying this program so much that I decided I'll give a talk about it. First I'll give it at my local meetup PyWeb-IL, and then I hope to give it at a bunch of conferences, like various PyCons around europe, Reversim Summit and more.

    I'm having a lot of fun preparing this talk, and I think the audience is going to really enjoy it. (Meme.) It's also a good way for me to contribute back to the VisiData community, since I'm hoping to get more people to join that community. When there'll be a video of this talk in English, I'll send it on the list.

  3. Do my first review.

    I volunteered for the program committee for the RaD-AI workshop. This means that I'll be one of the people doing reviews for papers people submit. I've been assigned one paper, so I should do the review in the next 7 days. I'll probably look up some YouTube videos on things to know for your first review. The nice thing about this being a workshop rather than a conference is that the requirements are more relaxed and I can take more time to learn.

That's it for now. See you next month!


Reply all
Reply to author
0 new messages