Clarification on Kaggle data structure and feature caching

125 views

Skip to first unread message

강태영

unread,

Jun 4, 2026, 12:34:57 PMJun 4

to physionet-challenges

Dear PhysioNet Challenge Organizers,

We have two clarification questions about the official phase.

The Challenge website describes both training_set_small/ and training_set_large/, but the Kaggle dataset appears to contain only the small training set contents directly at the root level:

physiological_data/, algorithmic_annotations/, human_annotations/, demographics.csv, ICD_codes_CI.csv

The record counts also match the documented small set:

S0001: 857, I0002: 54, I0006: 192

Could you confirm whether the current Kaggle dataset contains only the small training set, and where or when the large training set will be available?

Is it allowed to extract features from the Challenge training data using a frozen/pre-trained backbone during train_model(), save the extracted features and downstream model artifacts inside the output directory, and use them later during inference?

Also, is it allowed to include precomputed feature caches derived from the Challenge training set directly in the submitted repository, or should all such caches be generated only during train_model()?

Thank you.

CAU-KU

PhysioNet Challenge

unread,

Jun 4, 2026, 12:36:23 PMJun 4

to physionet-challenges

Dear CAU-KU,

We have created small (~200 GB) and large (~1.2 TB) versions of the training set. Each team can decide whether it wants us to run their training code on the small or large version of the training set. We have posted the small version of the training set on Kaggle, and we are working to post the large version. Please check this part of the webpage to access the Challenge data:
http://physionetchallenges.org/2026/#data-access

Each team must submit working training code that learns from the provided training data. If the training data changes, e.g., someone uses your code later with more or better data, then the model from your training code should also change. You can use external data and perform transfer learning, but not to circumvent the training requirement.

We will make small changes and updates for clarity over the next several days, and we will begin to accept official phase submissions thereafter, so please continue to ask questions and share feedback on the Challenge forum or directly if it reveals information about your approach.

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Reply all

Reply to author

Forward

0 new messages