Research update: Rebellion and Disobedience in AI

29 views

Skip to first unread message

Ram Rachum

unread,

Dec 5, 2022, 9:52:24 AM12/5/22

to ram-rachum-res...@googlegroups.com

Hi everyone!

Here's my research update for this month.

Retrospective on goals for last month

In the last month's update I outlined a list of goals. Here's my update on these goals:

Run experiments with more interesting convention-forming games: ↷ Postponed I experimented with a territory-conquering game where agents want to conquer as many tiles as possible. I hoped that some agents would like to take a corner and defend it. It didn't happen, and I didn't even get a good enough performance on this game. (Video.)
After struggling with it for a bit, I put it aside while I was getting interested in other experiments, detailed below. I may still pick it up in the future.
Have 1-on-1 meetings with some of the people I met at AISIC 2022: ✅ Done

Stuff I've done beyond the monthly goals

FTX fiasco

You probably heard a lot of news stories about the collapse of the FTX crypto exchange during the last month. Unfortunately, Sam Bankman-Fried was one of the biggest contributors to the Effective Altruism movement, and by extension, to the field of AI Safety. That is unfortunate for me, because grants in this field will now be more difficult to acquire. There are still other big donors, such as Dustin Moskovitz, Cari Tuna and Jaan Tallinn, so I'll keep on trying.

"You can't fetch the coffee if you're dead"

I had a really cool idea: I could make an RL demo of Stuart's Russell's "You can't fetch the coffee if you're dead" scenario (explanation). It could have been amazing, but then I found that DeepMind researchers did something similar in a 2017 paper called "AI Safety Gridworlds" (paper, blog post, code). I think I could still make a cooler demonstration than the one they did, but now that I know I'll have to convince reviewers that mine is better, I think it's not worth the trouble.

Before giving up on this idea, I discussed it with Reuth Mirsky. I mentioned her before; Reuth is a professor at Bar-Ilan University who helped me write my first workshop paper 6 months ago. She's more into the field of Multi-Agent Systems (MAS), which is kind of like MARL without the RL part.

Reuth said that this idea is similar to something that she's interested in: Rebellion and Disobedience in AI.

Rebellion and Disobedience in AI

Here's a scenario that Reuth discussed with me. Imagine a blind person walking down the street, with a seeing-eye dog by their side. The dog's job is to follow the person's lead and help them with whatever they need. However, if the blind person were to try to cross a busy street without waiting for a green light, the dog would stop that blind person. The interesting behavior here is that the dog is doing a complete 180 on its usual behavior. Instead of following the person's lead, it would directly resist the person's lead, either by barking, blocking the person with its body, or pulling the person in the other direction.

It would be cool to see whether we can get an AI to do that. I'm going to try to make a series of experiments to produce that behavior. If I'm successful, then maybe Reuth and I could write a paper about that. This'll be great for me, because Reuth is much more experienced than me in writing papers. More details about this project in the "My goals for this month" section below.

RaD-AI workshop

Reuth is organizing a workshop on Rebellion and Disobedience in AI. (Workshop site) If you need a reminder on what a workshop is or what is its role in the academic ecosystem, I explained that here.

Reuth invited me to be a co-organizer of this workshop. This is great for me, because I'll learn what it's like to be on the inside of an academic event. We're now waiting to hear whether our workshop will be accepted to AAMAS 2023. Yes, workshops themselves have to be accepted to conferences. This was surprising to me. Some workshops don't get into their first choice of conference, and they end up moving from conference to conference in successive years, kind of like a travelling circus.

I'll update next month on the acceptance of this workshop.

My goals for this month

Read up on rebellion and disobedience in AI.
I should read some of the papers about RaD-AI, including:
- Milli et al (2017): Should Robots be Obedient?
- Arnold et al (2021): Only Those Who Can Obey Can Disobey: The Intentional Implications of Artificial Agent Disobedience
- Mirsky et al (2021): The Seeing-Eye Robot Grand Challenge: Rethinking Automated Care
There are also some videos here that I could watch.
Make a MARL experiment that shows a tug-of-war.
I've been brainstorming and iterating on a few ideas for experiments in RaD-AI. When Reuth described the seeing-eye dog scenario, what was really appealing to me was imagining that this dog, who has been following the human's lead so far, has suddenly learned to actively stop and resist the human's motion. I think that this kind of "tug of war" between human and dog is very interesting. Even though it is a fight, the dog wants what's best for the human.
I would like to recreate this tug of war in MARL. I'd like to make an experiment where two agents are fighting. They'll be fighting not in order to hurt each other or gain an exclusive resource, but because they both want the best result for both of them. I'm planning this as a fully cooperative game.

That's it for now. See you next month!

Ram.

Reply all

Reply to author

Forward

0 new messages