Research update: HAAISS, policy churn, Long-Term Future Fund

Skip to first unread message

Ram Rachum

Aug 31, 2022, 12:28:11 PMAug 31

Hi everyone!

Here's my research update for September.

Retrospective on goals for August

In the August update I outlined a list of goals. Here's my update on these goals:

  1. Attend HAAISS, learn about Human-Aligned AI, meet new people: ✅ Done

    I had a good time at HAAISS (Human-aligned AI summer school). This is the first time I participated in a real-life academic event.

    HAAISS is a gathering of researchers working on the new field of AI alignment. The goal of this field is to prevent an Artificial General Intelligence that will be developed in the future from being hostile to humans. Another way to put it is that we want to have the AGI be aligned with our moral values.

    Because AI alignment is an emerging field, many of the researchers who came to HAAISS aren't officially in AI alignment, but in related fields. Some of them are trying to gradually migrate to AI alignment positions, but there aren't many such positions yet. One of the goals of HAAISS is to solve this chicken-and-egg problem, also known as "field-building". The academic world moves slowly, so we can expect this growth to take many years.

    One of the big challenges for researchers in this field is that they're trying to predict how the "evil AGI takes over the world" scenario will play out, and plan how to prevent it. Because that scenario hasn't happened yet, they have very little data to work on. They try to draw conclusions from different fields, but most of it is just speculation. This isn't a criticism of the research efforts; they're doing the best with the limited information they've got.

    One open question is whether the way to get AI to behave morally is by tailoring its reward function to give a high reward for moral behavior, or alternatively by introducing safety controls for its actions, which could mean limiting its action space or completely shutting down the AI once it does an action that's considered bad. Or maybe the answer is something else entirely.

    My value proposition for the AI alignment field is to give them a scenario to draw conclusions from. If I can get an RL agent to reciprocate to its fellow agents, i.e. treat them well if they're treating it well, then this will be a form of AI displaying emergent moral values. Hopefully it could be used as a reference point for AGI. I gave a talk about my research at HAAISS and it was well-received.

    To the open question above, my hunch is that the answer will be neither of the options. I think that way to get AI to develop moral values is by eliciting them as emergent phenomena over a selfish reward function. But I should focus on showing that this happens in RL experiments before I think about AGI.

    Besides learning about AI alignment, I made contacts with a few researchers at HAAISS and I hope they'll be helpful for me in the coming months.

  2. Have fun in Prague: ✅ Done

  3. Learn RLlib and figure out multiple brains: ↷ Postponed

    This is my most important goal right now, which is why I procrastinated on it in the most masterful way possible. I'm going to work on it this month.

    I did play a bit with RLlib on an Ubuntu VM and I noticed it's a resource hog. So a few days ago I bought an extra desktop computer just for my research. I installed Ubuntu on it and now I gotta finish setting it up.

Stuff I've done beyond the goals

Tom Schaul's paper: "The Phenomenon of Policy Churn"

I was talking with Joel and he recommended that I read the paper The Phenomenon of Policy Churn. It's a new research paper from DeepMind by Tom Schaul, Georg Ostrovski et al.

I really appreciated the way Tom, Georg and friends wrote that paper. It has a "here's what we want to do and here's how we're going to do it" approach that I find very accessible. I'm new at reading research papers, and most of the papers I've read are difficult for me. (Probably because they're optimized for passing reviews rather than being accessible to beginners.) This is a refreshing change, and I hope that when I'll write papers, I could do it in this style.

Here's my oversimplified summary of this paper:

When you train an RL agent, it goes through hundreds of "learning steps" in which it looks back at its past performance and slightly changes the weights of its neural network so it will perform better next time. Each learning step on its own provides a miniscule improvement; it takes hundreds of learning steps to get a meaningful cumulative improvement.

What Tom and his friends found out is that even though each learning step improves the performance by just a little bit, it changes much more of the agent's policy than other researchers thought. By "changing the policy", we mean "change the action that the agent is likely to take given a certain observation of the environment". Researchers thought that the policy would change by around 0.01% every step, but the authors found that the policy changed by 10%!

While many of the changes that make up this 10% are somewhat random, and don't meaningfully affect the performance of the agent, it's possible that this source of randomness provides the exploration needed by learning agents to find good policies. It's possible that this previously-unknown source of exploration is the explanation for the success of some of the RL algorithms that are used today. We hope that this knowledge could be used when designing new RL algorithms.

Random updates

I'm subscribed to get email updates from the DeepMind blog. (I use Feedrabbit to get email updates.) I got an update that they posted an interview with my mentor, Edgar Duenez-Guzman. It's cute and light, give it a read. The photo with the chickens cracked me up. (Edgar and his wife keep chickens.) If you wondered why he has a picture with a goose, this is the reason.

Speaking of AI feed, I recently signed up for Rohin Shah's Alignment Newsletter. If you're interested in gradually learning more about AI alignment, this would be useful to you. Rohin has been working as a researcher in DeepMind for the last two years. He also gave an accessible talk about AI alignment at HAAISS.

My goals for August

  1. Learn RLlib and figure out multiple brains.

    Same goal from last month. RLlib looks like the most promising MARL framework out there, and I want to use it to run experiments.

  2. Run experiments with agents trying to move away from each other.

    When I was preparing my slides for HAAISS, it got me thinking on my strategy for achieving emergent reciprocity. I made this slide to show the basic outline. The researchers at DeepMind introduced sequential social dilemmas (SSDs), claiming that they should be used in research instead of MGSDs like prisoner's dilemma. An SSD is temporally extended, so the agents have more opportunity to interact with each other before making any cooperate/defect decisions.

    My strategy is to take the next step: Maximize that opportunity. This means, come up with game rules where the player-to-player interaction is maximized. Many of the SSDs I've seen, like Allelopathic Harvest, are more like "player vs environment". This means that each agent affects the other agents indirectly. I want to see what happens when you run SSDs where the actions of each player have a big and immediate effect on the optimization space of the other agents. My hunch is that this could lead us to emergent reciprocity.

  3. Apply to LTFF for funding.

    When I was at HAAISS I talked to lots of people about my research. They strongly suggested that I apply for funding from LTFF. I'm working on my application now. It's a long and detailed questionnaire.

    I'll want to have a few people review my application before I send it. If you have experience with grant-writing, especially to LTFF or in the Effective Altruism community, and are willing to review my application, please send me an email.

That's it for now. See you in October!


Reply all
Reply to author
0 new messages