Hello MetaWorld Users!
As ICLR deadline is rapidly approaching, we would like to share some improvements that we’ve made to the MetaWorld benchmarks, and give you resources for successful benchmarking.
Firstly, thank you so much for your patience-with and feedback-about the MetaWorld codebase. Consistently benchmarking meta-/MTRL is a huge technical challenge, and we’ve spent the last 11 months refining the benchmark, fixing bugs, and adding tons of tests to avoid breakages and regressions. We couldn’t have done this without your conscientious bug reports and questions. We are committed to continuously developing and improving MetaWorld for the meta-RL and MTRL communities.
If you had a bad experience the first time you tried MetaWorld, we apologize. Please give it another shot! The original code release contained several bugs which made MetaWorld results difficult to create and reproduce, and we've worked really hard this year to make your experience better.
If you have any questions at all, please join our MetaWorld slack community by filling out this Google Form.
Significant Code Changes since October 2019
All changes below make the benchmark more consistent with experiments from the publication on arXiv. The underlying 50 environments (MDPs) have not been modified in any way since publication, i.e. the code on GitHub is faithful to the published results.
Sept 5, 2020: Increased the horizon limit from 150 to 200 time steps for many environments. This makes the benchmark less seed-sensitive by making it easier for many random seeds to successfully explore (PR #223).
July 22, 2020: Added "MT1" benchmark, which is an environment configuration-conditioned version of ML1. MT1 is faithful to how ML1 is used in the arXiv publication (PR #165).
July 10, 2020: Added environment configuration information to observation space of MT10 and MT50. This is consistent with how the multi-task algorithm in the arXiv publication were evaluated (PR #138).
July 10, 2020: New benchmark API — see below for more information. The new API makes it easier to train and evaluate meta-RL, by giving you more control over when new environment configurations are sampled (PR #143).
May 18, 2020: Use fixed goals for MT10/MT50. Prior to this change, MT10/MT50 environments were sampling a new goal on every call to Env.reset(), making it difficult to reproduce the results from the paper (PR #86).
March 30, 2020: Removed termination signals (done=True) when agents reach the horizon limit. This can cause problems with bootstrapping for some off-policy algorithms, by conflating a termination signal with a horizon limit. The off-policy algorithm implementations from the publication ignored termination signals, so were unaffected by this change (PR #45).
Jan 10, 2020: Bug fixed in which drawer-close-v1 sampled goals which were unreachable (PR #37).
Dec 2, 2019: Bug fixed in which ML1 was sampling a random goal on each call to Env.reset() (PR #37).
Nov 3, 2019: Reduced the amount of memory consumed by MetaWorld by ~10x (PR #14).
Improved Benchmark API
For a tutorial on how to use the updated MetaWorld API, please refer to the README on GitHub and/or take a look at this short (1m30s) video explaining its usage.
If you are looking for more thorough examples of how to utilize MetaWorld, take a look at our sister project Garage for comprehensive usage examples and baseline implementations for all algorithms in the MetaWorld paper. See this Garage PR for an example of how to migrate your experiment code from the old API to the new API.
Tips and Trick for using MetaWorld
Update your code to the new API and use the most recent version of MetaWorld from GitHub. MetaWorld has not established a regular release cadence yet, so you need to update to the latest version available on GitHub to benefit from recent improvements to reproducibility and usability.
Always run multiple seeds (preferable 5-10, per Colas, et al.) for your experiments, and calculate a confidence interval using those seeds. Performance in reinforcement learning is highly seed-sensitive, and this is especially true in difficult environments such as those in MetaWorld, where exploration is key to performance.
Re-run your baseline experiments rather than relying on curves from publications. As careful as members of the RL software community are about not creating regressions, RL software is still in its infancy, and performance regressions are common in both environment and algorithm implementations. Changes to upstream dependencies (e.g. numerical libraries like PyTorch and TensorFlow, and physics engines like MuJoCo) can induce significant latent changes to the performance curves you will observe.
Prefer well-tested off-the-shelf algorithm implementations for running baseline experiments over custom implementations. Using well-tested and broadly-shared implementations benefits the research community by establishing shared performance standards, and shields you from review criticisms which imply that you may have used a half-hearted baseline implementation. RL algorithms are difficult to fully reproduce, and meta/MTRL algorithms are doubly-difficult to re-implement. Of course, remember to always cite the implementations you use.
Join the Community for Support
If you have any questions about how to use MetaWorld, we would love to help! Please join our Slack community by filling out this Google Form.
Happy benchmarking!
-Avnish, Ryan, and the MetaWorld Team.