WILDS v1.1 Release

38 views

Skip to first unread message

Shiori Sagawa

unread,

Mar 10, 2021, 2:34:08 AM3/10/21

to wi...@googlegroups.com

Hi everyone,

We've released WILDS v1.1! Please update the package by running pip install -U wilds. Here is a summary of the changes.

New dataset. We have added a new benchmark dataset Py150-WILDS. It is a code completion dataset where the distribution shift is over code from different Github repositories. We evaluate models on their accuracy on the subpopulation of class and method tokens, as prior work has shown that those are the most frequent queries in real-world code completion settings. It is a variant of the Py150 dataset from Raychev et al., 2016.

Dataset and model updates. We updated some of the existing datasets and default models to make them significantly faster and easier to use: the training time for each dataset to less than 10 hours (on a V100) for most datasets. However, some of these changes are breaking changes that will impact users who are currently running experiments with WILDS. We sincerely apologize for the inconvenience. We ask all users to update their package to v1.1.0, which will automatically update your datasets. In addition, please update your default models, for example by using the latest example scripts in our Github repo. At this time, we do not expect to have to make further changes to the existing datasets or default models.

Experiments and baseline models. All of our baseline experiments are now available on CodaLab here. It includes the exact commands used to run baseline experiments as well as all experiment outputs, including model parameters.

Paper. We have updated our arXiv paper to include results on Py150 and additional baseline experiments on the other datasets. In addition, we have added an analysis of distribution shifts in an algorithmic fairness context using the New York Police Department stop-question-and-frisk dataset adapted from Goel et al., 2016.

For more details on the v1.1 update, please see our release notes.

Finally, we are actively developing WILDS and would love to hear how we can make it better for you:
- If you encounter any issues, please report them as Github issues.
- For questions, feedback, and discussions, please head over to Github discussions.

We’re also currently working on a few new datasets that we’re hoping to include in a subsequent release, so please stay tuned!

Thank you!
WILDS Team

Reply all

Reply to author

Forward

0 new messages