Hi PhysioNet Team,
First, thank you for organizing the challenge.
I have a question about the data provided. I found two versions of the challenge dataset on Kaggle (v1 and v2). Contrary to what one might expect, v2 has significantly less data (622 vs. 780 raw sleep recordings), along with a different age distribution and site case-count balance. Depending on when participants first downloaded the data, they may now be working from diverging baselines.
I could not find any mention of this change - neither on the challenge website nor in the announcements/forum posts. We only discovered it because my model's performance was vastly inferior compared to a teammate's.
Could you please clarify:
I think it's important to address this publicly as soon as possible, so all participants are on equal footing.
Best, Volodimir