Introducing Meta-World-V2!

849 views

Skip to first unread message

Avnish Narayan

unread,

Jun 25, 2021, 3:51:38 PM6/25/21

to Meta-World Announcements

Hello Meta-World Users!

We’re excited to announce our official launch of Meta-World V2! Many of you have been using V2 since we softly released it on the Meta-World repo in March of this year. You can now see the official benchmarking results of Meta-World V2 in our updated arxiv submission. We hope that access to the latest benchmarking results makes your time as users easier when you need to compare your results.

Thank you so much for your patience-with and feedback-about the Meta-World codebase. Consistently benchmarking meta-/MTRL is a huge technical challenge, and we’ve spent the past year redesigning the benchmark from its core.

Major Changes

Redesigns of the reward functions of all 50 environments. This effort has made the environments robustly solvable in a reasonable time frame (2-5m timesteps) on multiple random seeds.

A face-lift on all of the environments, for a more realistic look.
Access to all of the environments as single task environments.
Scripted policies, which can be used for generating expert demonstrations for research in learning from demonstrations/interruption, imitation learning, or, offline-RL.
A simplified api which we wrote to you about in our last newsletter. Watch this short (1m30s) tutorial, or read the guide in the README for a short but comprehensive tutorial on how to use the tools in this project.

Overall Effect

When using Meta-World V1 a difficult question we found ourselves facing was “Is my meta/multi-task RL algorithm’s failing because the individual environments are difficult, or is it failing due to fundamental challenges in meta/multi-task RL?”

By redesigning the environments’ reward functions to make them robustly solvable, we’ve made it easy to measure the component of performance attributable to fundamental challenges in multi-task RL. See the attachments for plots that show an increase in performance, decrease in rise-time, and reduction in the variance of the performance from Meta-World V1 to Meta-World V2 on our MT-10 and ML-10 benchmarks.

A Thank You To Our Users

We couldn’t have done this without your conscientious bug reports and questions. We’ve iterated on this benchmark many times over the past year, and are confident that we’re delivering a tool that can help to accelerate your robot-learning research.

We’ll be shortly releasing a technical report that details techniques that we used for designing Meta-World V2 with full explanations and results of how we benchmarked its performance. Be on the lookout!

Lastly, we’re in the process of cleaning up the baselines that we used for getting results for Meta-World-V2. You can keep track of our progress in PR #2287 in our sister repository, Garage.

If you have any questions at all, please join our MetaWorld slack community by filling out this Google Form.

Happy benchmarking!

-Avnish, Hayden, Adithya, Ryan (the Meta-World Team).

Tips and Trick for using MetaWorld

Update your code to the new API and use the most recent version of Meta-World from GitHub. MetaWorld has not established a regular release cadence yet, so you need to update to the latest version available on GitHub to benefit from recent improvements to reproducibility and usability.
Always run multiple seeds (preferable 5-10, per Colas, et al.) for your experiments, and calculate a confidence interval using those seeds. Performance in reinforcement learning is highly seed-sensitive, and this is especially true in difficult environments such as those in MetaWorld, where exploration is key to performance.
Re-run your baseline experiments rather than relying on curves from publications. As careful as members of the RL software community are about not creating regressions, RL software is still in its infancy, and performance regressions are common in both environment and algorithm implementations. Changes to upstream dependencies (e.g. numerical libraries like PyTorch and TensorFlow, and physics engines like MuJoCo) can induce significant latent changes to the performance curves you will observe.
Prefer well-tested off-the-shelf algorithm implementations for running baseline experiments over custom implementations. Using well-tested and broadly-shared implementations benefits the research community by establishing shared performance standards, and shields you from review criticisms which imply that you may have used a half-hearted baseline implementation. RL algorithms are difficult to fully reproduce, and meta/MTRL algorithms are doubly-difficult to re-implement. Of course, remember to always cite the implementations you use.

Join the Community for Support

If you have any questions about how to use Meta-World, we would love to help! Please join our Slack community by filling out this Google Form.

MAML_v1_vs_v2_ml10.png

MTSAC_v1_vs_v2_mt10.png

window_old_new.gif

Reply all

Reply to author

Forward

0 new messages