Hi everyone!
Here's my research update for this month.
In last month's update I outlined a list of goals. Here's my update on these goals:
Learn better tools for quickly exploring tabular data: ✅ Done
I spent the last month digging into VisiData, and I'm so happy that I found this tool. I'm planning on giving a talk about it at PyCons and other engineering conferences.
VisiData is a command line tool, which is something very important to me. I want to explore the tabular data that my experiences produce, and I don't want to be touching the mouse and to be using graphical UIs where I have to hunt for the right buttons to click.
I used Jeremy's tutorial to learn how to use VisiData. After a few hours I understood that there's a really strong concept here. I would say that VisiData is very similar to Vim in philosophy. Everything is a table, or in VisiData terms, a "sheet". A CSV file is a sheet. A JSON file is a sheet. An SQLite database is a sheet. You can open multiple sheets in the same time. If you want to see the list of all sheets you have open, you can do that; that's also considered a sheet! The idea is basically, see everything as a sheet, and then all the tools that operate on sheets, can automatically be used on all of these sheets. You can even use VisiData to browse folders!
Another Vim-like quality is that the keyboard bindings have a certain logic to them. Prefixing a command with g
(for "global") makes it act on more things, while prefixing a command with z
make it act on less things. It's hard to explain, but easy to understand after you do the tutorial.
I started using computers when I was a child in the nineties, and VisiData reminds me of both Lotus 1-2-3 and Norton Commander. I'm enjoying the nostalgia. It can even draw scatter plots, right in the terminal! Here is the reward of one my agents during training:
VisiData is very extensible, and I've already 3-4 scripts that are indispensable for my work.
VisiData development is led by Saul Pwanson. Saul has a Patreon so I joined it. If you'd like to support Saul's work on VisiData, consider joining it as well. Thanks for your work Saul!
Get the agents in my RaD-AI experiment to show interesting behaviors: ↷ Postponed
I've been a little frustrated with this experiment.
Part of the reason is that my M.O. has been to first try to get the more trivial behaviors, and then try to gradually climb up to the more interesting behaviors. But it's difficult to define exactly what those trivial behaviors are. For example, one behavior I considered trivial, is that on the first turn of every skirmish, each agent should pick the reward that seems highest to it according to its estimates. Despite the fact that its estimates might be wrong, there isn't any input from the other agent yet, so this choice seems the most reasonable for me.
However, my agent only got to around 76% success in this metric. I expected at least 90%, maybe even 95-99%. I'm not sure whether it means that my agents haven't learned well enough, or that there's a social dynamic around this metric that I haven't realized.
I hope I'll continue with this experiment later, but in the meanwhile I'm working on a different experiment that I'm excited about. I'll tell you about it below.
A friend told me that Michael Wooldridge, a famous professor from Oxford, would be coming to Bar-Ilan University to teach a quick course about Computational Game Theory. I thought it was a good opportunity to fill some gaps in my fundamentals. The course was 5 meetings of 90 minutes each. It was more basic than I thought, but I still learned a few things. Wooldridge used a slide deck from the Oxford course on Computational Game Theory, though we only learned 30% of the full course. If you want that very long slide deck, let me know and I'll email it to you.
I discovered the Berkeley MARL seminar a few months ago and I'm very happy I did. It no longer really belongs in Berkeley, since its admins, Eugene Vinitsky and Natasha Jaques , are now at NYU/Apple and Google Research respectively. It's basically a group of 5-10 people that meet every week to hear a MARL-related talk. Most people there are further along in their research journeys than I am, so it's a good learning experience for me. Feel free to join! Just fill out this form.
Something a little unrelated, but still cool: Reuth's mission is to make an AI-based seeing-eye dog robot for blind people. She has her students work on different aspects of the problem. Here is one test of the Boston Dynamics robot Spot acting as a seeing-eye dog.
I started working on a series of experiments that are related to the convention-forming games that I worked on a few months ago.
These experiments revolve around the game of chicken. Chicken, like prisoner's dilemma, is one of the basic matrix game social dilemmas in Game Theory. I see lots of the conflicts that we deal with in our lives as a game of chicken. The basic fight-or-flight instinct is a textbook example. Another equivalent pair of terms for these two choices is "hawk" and "dove", and I'll use this terminology because it's both unique and short.
In the chicken game, the best result for each agent would be to play hawk while the other agent plays dove. The second-best result is that the agent plays dove, and its opponent plays dove. The third-best result is that the agent plays dove and the opponent plays hawk. And the very worst result is that both of the agents play hawk.
In prisoner's dilemma, there are two kinds of pressure that act on agents to get them to defect: Greed and fear. In chicken, there's only greed. Because of that, there are two Nash equilibria rather than just one. This means that chicken has flip-flop behavior.
The first experiment I ran was simple. I let two agents play a repeated game of chicken against each other. Each agent has its own separate brain. Can you guess what happens?
One agent learns to always play hawk, while the other agent always plays dove. This is interesting, because the two agents are otherwise identical. Once one of them leans slightly in favor of hawk, the other agent benefits from leaning towards dove. These two leans reinforce each other until both agents are 100% committed to their policy.
You could look at this result from two points of view: The group's point of view and the individuals' points of view. If you look at it from the group's point of view, it can look like a happy result, because the two agents agreed on a convention that minimizes the great losses that can happen when two agents play hawk simultaneously. However, if you look at the result from the point of view of the agent who is playing dove, that agent is getting royally screwed over: They're going to get less points than the opponent until the end of time, and if they try to protest by playing hawk, they get even fewer points.
Next, I put 6 agents together. Each agent has a constant index number that they stick with forever, between 0 and 5. In each episode, they get paired randomly and play 20 rounds of chicken against the same opponent. Each agent sees the index number of the other agent they've been randomly paired with.
What happens is very cool. The agents usually settle on one of two conventions: "higher number plays hawk, lower number plays dove" or "higher number plays dove, lower number plays hawk". Here is a sheet with the results of a run that ended up in the first convention. agent_0_hawkishness
means how likely agent 0 is to play hawk. In line 23, you can see that the hawkishness is linearly correlated with agent index number. On other runs, it's the exact opposite. Here's a plot of that sheet:
This is fascinating because the index number is basically determining the social status and total reward of each agent, even though it's otherwise a completely meaningless number! This result starts to smell like the handicap principle seen in the animal world, though it's a bit different. It's also reminiscent of the way that in human society, we have standards of social status and beauty that might be based on completely arbitrary attributes.
The reason this happens is that the rule "higher number plays hawk" is very simple to learn, and once enough agents slightly lean towards this rule, this tendency reinforces that rule in a vicious cycle. When this convention becomes stronger, it becomes impossible for each agent to resist it.
Write a short paper for the RaD-AI workshop.
I haven't worked on the RaD-AI experiment for a few weeks, because I'm too excited about the convention-forming one. But I think I should bring it to a state where it shows some interesting behavior, put it up on GitHub and write a 2-page paper about it for the RaD-AI workshop
If the paper gets accepted, I'll then present it at the workshop in London.
Work on convention-forming experiments with the chicken game.
In the second experiment I mentioned above, the agents learned to identify each other by their index number. They could have thoughts like "Oh, I'm playing with agent 5 today, this guy usually plays hawk so I better play dove." This is cool, but I want these social conventions to be more dynamic. I'm working on experiments in which the agents are also sending out messages that partly indicate whether they are likely to play hawk or dove. This interplay between the shifting social conventions and the agents' perception of them is fascinating.
One challenge that I face with these experiments is that they feel like they're very profound, but I haven't done a good enough job of connecting them to my long-term research goals. I'm going to search for that connection, and I hope that my intuition is correct and that they will indeed lead me to good things.
That's it for now. See you next month!
Ram.