WILDS v2.0 Release

57 views
Skip to first unread message

Pang Wei Koh

unread,
Dec 12, 2021, 9:23:30 PM12/12/21
to wi...@googlegroups.com

Hi everyone,


We’re excited to announce WILDS v2.0! This release adds unlabeled data to 8 of the 10 WILDS datasets, allowing us to develop and evaluate methods for leveraging unlabeled data to adapt to distribution shifts. 


New unlabeled data. We’ve added a total of 14.5 million unlabeled examples across the 8 datasets. This expands the number of examples for each dataset by 3 – 13x. The unlabeled data comes from the same underlying sources as the original labeled data, and it can be accessed through our new unlabeled data loaders, which follow similar interfaces to the existing labeled data loaders. 


New baseline algorithms. We’ve also expanded our example scripts to update and add new algorithms that make use of the unlabeled data:


  1. CORAL (Sun and Saenko, 2016)

  2. DANN (Ganin et al., 2016)

  3. AFN (Xu et al., 2019)

  4. Pseudo-Label (Lee, 2013)

  5. FixMatch (Sohn et al., 2020)

  6. Noisy Student (Xie et al., 2020)

  7. SwAV pre-training (Caron et al., 2020)

  8. Masked language model pre-training (Devlin et al., 2019)


Paper. In our arXiv preprint, we describe the unlabeled data as well as systematically benchmark all of the above algorithms on the WILDS datasets. We found that many methods did not outperform standard supervised training despite using the additional unlabeled data, and we believe there is significant room for improvement. We’ll be presenting the paper at the DistShift workshop poster session at NeurIPS tomorrow, from 1-3pm Pacific Time, December 13. If you’re attending NeurIPS and would like to find out more or discuss the paper, please drop by our poster.


Leaderboard. We’ve updated our leaderboards to include a track for methods using unlabeled data, and we invite submissions that use the unlabeled data to adapt to the distribution shifts. 


We look forward to seeing what you’ll do with the new unlabeled datasets! As usual, if you have any questions, feel free to post a Github issue/discussion or contact us at wi...@cs.stanford.edu


Thank you,

WILDS Team

Reply all
Reply to author
Forward
0 new messages