Clarification of expectations for training code

171 views

Skip to first unread message

PhysioNet Challenge

unread,

Jul 19, 2021, 8:21:07 AM7/19/21

to physionet-challenges

Dear Challengers,

We recently received several questions from teams about the requirement that teams submit training code in their entries, so we wanted to clarify our expectations.

During last year’s Challenge, we started requiring teams to submit the training code for their models. This training code must be working and able to run on the provided training data and labels using the provided computational resources to produce a model that we can evaluate on the hidden data. We understand that these requirements require more human and computational effort from teams, but we believe that they are important for the reproducibility of the Challenges in particular and data science in general.

We have also been asked to increase the computational capacity to help those with computationally intensive submissions. However, due to resource constraints and concern for our environment, we are unable and unwilling to increase the resources provided. An algorithm that consumes too many resources is both impractical and unethical. We consider it fundamental to the Challenge to be efficient and parsimonious.

In some cases, teams have only submitted their pretrained models. This does not satisfy the training requirements, so these teams will not be eligible for rankings or prizes unless they submit their training code before the end of the official phase. Teams do not necessarily need to submit their training code with every submission, but we strongly recommend it because most teams need multiple attempts to submit training code that we can run, and the entry that we ultimately run on the test data must contain working training code.

In other cases, teams have submitted their training code, but they have asked us not to run it, or they have asked us to change their submission in some way to run it. Unfortunately, this also does not satisfy the training requirements, and these teams will also not be eligible for rankings or prizes unless they submit their training code before the end of the official phase. We cannot know if someone’s training code works unless we run it, and if we need to manually change someone’s submission to run their training code, then we risk making the wrong changes, so we need teams to submit training code that runs as-is.

We understand that data preprocessing can be computationally expensive, but data preprocessing is a part of the training process, so teams need to include their data processing and feature extraction code in their submissions. Of course, teams can use preprocessed data or precomputed features as they develop their models, but the entry that we ultimately run on the test data must contain working data preprocessing and feature extraction code.

We do allow transfer learning, so teams can include a pretrained model in their submissions, but their training code must continue to train or retrain the model on the provided training data and labels. We are able to detect when teams do not provide retrainable models, and unfortunately, this does not satisfy the training requirements. Teams that use transfer learning do not need to provide the training data for their pretrained models, but they must adequately describe the data and the transfer learning process that they use in their papers.

We also understand that many training algorithms are stochastic, so the same training code may produce somewhat different models on the same training data. This is, to some extent, currently unavoidable due to asynchronous optimization schemes, but many machine learning packages allow the use of random seeds and provide other options to improve reproducibility. If you are concerned that your training code may produce very different models, then you may want to investigate these options for your approach.

Best,

Matt

(On behalf of the Challenge team.)

https://PhysioNetChallenges.org/
https://PhysioNet.org/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Reply all

Reply to author

Forward

0 new messages