Different dataset versions exist on Kaggle?

100 views
Skip to first unread message

Volodimir Buchakchiyskiy

unread,
Apr 5, 2026, 2:02:24 PMApr 5
to physionet-challenges

Hi PhysioNet Team,

First, thank you for organizing the challenge.

I have a question about the data provided. I found two versions of the challenge dataset on Kaggle (v1 and v2). Contrary to what one might expect, v2 has significantly less data (622 vs. 780 raw sleep recordings), along with a different age distribution and site case-count balance. Depending on when participants first downloaded the data, they may now be working from diverging baselines.

I could not find any mention of this change - neither on the challenge website nor in the announcements/forum posts. We only discovered it because my model's performance was vastly inferior compared to a teammate's.

Could you please clarify:

  • Why were cases removed, and why these specific ones?
  • Are participants allowed to use both versions, or only one?
  • Which version will be used on the submission server?

I think it's important to address this publicly as soon as possible, so all participants are on equal footing.

Best, Volodimir






PhysioNet Challenge

unread,
Apr 5, 2026, 2:05:11 PMApr 5
to physionet-challenges
Dear Volodimir,

We updated the training set on Kaggle to remove patients that already had one or more cognitive impairment diagnoses at the time of or shortly after their sleep study. We updated it shortly after posting the initial training set at the start of the unofficial phase and before we began evaluating entries for the unofficial phase. You may use the past version of the dataset if you so wish, but it may not be very useful, and we will we evaluate teams using the current version of the dataset on Kaggle.

We will improve and update the training set for the official phase, so please do share any observations about the data in particular and the Challenge overall as you prepare your final entries and CinC abstracts; feedback is always welcome and especially useful during the unofficial phase. You can post them on the Challenge forum, or, if they may reveal hints about your approach, you can reply to us directly.

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.
Reply all
Reply to author
Forward
0 new messages