Dear PhysioNet Challenge Organizers,
We have two clarification questions about the official phase.
The Challenge website describes both training_set_small/ and training_set_large/, but the Kaggle dataset appears to contain only the small training set contents directly at the root level:
physiological_data/, algorithmic_annotations/, human_annotations/, demographics.csv, ICD_codes_CI.csv
The record counts also match the documented small set:
S0001: 857, I0002: 54, I0006: 192
Could you confirm whether the current Kaggle dataset contains only the small training set, and where or when the large training set will be available?
Is it allowed to extract features from the Challenge training data using a frozen/pre-trained backbone during train_model(), save the extracted features and downstream model artifacts inside the output directory, and use them later during inference?
Also, is it allowed to include precomputed feature caches derived from the Challenge training set directly in the submitted repository, or should all such caches be generated only during train_model()?
Thank you.
CAU-KU