Using full HSP dataset for pretraining: compliance question

29 views
Skip to first unread message

Simon Böhi

unread,
Mar 16, 2026, 2:47:52 PM (yesterday) Mar 16
to physionet-challenges
Hi,

I'm planning to pretrain a model on the full Human Sleep Project (HSP) dataset available on BDSP, prior to fine-tuning on the Challenge training set.

My understanding from the challenge page is that the validation set comes from site I0004 and the test set from site I0007. To avoid data leakage, I would exclude all recordings from both of these sites before pretraining — using only data from sites not involved in the hidden evaluation sets.

Would this approach be considered compliant with the Challenge rules?

Thanks,
Simon Böhi

PhysioNet Challenge

unread,
Mar 16, 2026, 2:51:14 PM (yesterday) Mar 16
to physionet-challenges
Dear Simon,

Thanks for the good question, and apologies for the delayed response.

Yes, as you indicated, we derived the training set from sites S0001, I0002, and I0006; the validation set from site I0004; and the test set from site I0007 of the Human Sleep Project (HSP):

You are more than welcome to use the data from the HSP for pre-training your model or otherwise, and this is completely fine with the Challenge rules — but please do describe your use of the data and cite the HSP! Of course, I would avoid trying to use data from the validation set and the test set, which will not be available during the Challenge.

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.
Reply all
Reply to author
Forward
0 new messages