Hi everyone! And Happy Passover.
Three weeks ago we finished our BXRL paper, submitted it to RLC, and it's now available as a preprint on arXiv: r.rachum.com/bxrl-pdf
That final week before submission was insane. I was working morning to night to solve all the problems and submit. I have mixed feelings about the paper. It's not as good as I wanted it to be because it's only a problem formulation without a solution. I do think it's well-written and well-argued. I hope that the RLC reviewers will think so too.
I've also open-sourced HighJax at github.com/HumanCompatibleAI/HighJax This is my port of the HighwayEnv driving simulation to JAX. Screenshot of its TUI:

It has its own training framework but it can also be run in the standard way using env, params = highjax.make('highjax-v0'). I hope that RL researchers will use it.
I'm happy that the paper deadline is behind me because it was very stressful.
Now I have to do lots of errands that I've postponed during the two month sprint for the paper. It's very tiring. I'm preparing for my CHAI internship this summer, which means getting a visa, flights, apartment, subletting my Tel Aviv apartment and many other tasks.
The war is still going strong. My family and I are safe, but the flights I bought were cancelled. I hope the war will be over soon and flights will be available again. If they won't, I may have to get out of Israel in creative ways. Many Israelis cross the land borders to Egypt or Jordan, and then fly from there. That's an option, but I heard that the airport staff in these places try to extort travelers for money, and I'd really like to avoid that. A ferry to Cyprus is another option.
I'm now working on a slide deck, a talk and a poster for BXRL. I'm having fun writing the slide deck. I think it's going to be a fun talk.
Research-wise, I have two directions I could pursue:
I have some ideas about why Experience Breakdown didn't work, and it's possible I could explore them and make it work. Experience Breakdown did work in my MARL experiments from last summer, and I suspect the salient difference is that their observation space was smaller and simpler. In the last update I listed two possible reasons for Experience Breakdown's failure. It's possible that by diving into these two reasons I may find a way to fix Experience Breakdown.
I'm not going to say too much about this one because it's too early. But I had an idea for an evaluation framework for XRL methods. One of the major issues of the field is that it's difficult to measure how good an explainability method is. We don't even have a formal definition of what an explanation is; our ability to measure the quality of an explanation is limited, and by extension, so is our ability to compare different XRL methods.
I came up with an idea to do something approaching an objective measure of the success of XRL methods, such that they could be compared to each other. I hope to tell you more about this next month.
Hag Sameach, Ram.