Inquiry Regarding CODE-15% Dataset Format

337 views
Skip to first unread message

수학수학

unread,
Jan 17, 2025, 10:00:53 PM1/17/25
to physionet-challenges

Dear Physionet Challenge Team,

I would like to confirm whether it is mandatory to convert the CODE-15% dataset (in HDF5 format) to WFDB format.

 If possible, I would prefer to work with the original HDF5 format.

Thank you for your time and assistance.

Sincerely,
Sehun

PhysioNet Challenge

unread,
Jan 17, 2025, 10:03:59 PM1/17/25
to physionet-challenges
Dear Sehun,

You do not need to convert the CODE-15% dataset, or any dataset that you use, to WFDB format. However, we will train and score your model using data in WFDB format, so any submitted code must be able to load data in WFDB format, but it can optionally support other data formats as well.

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Ahmet Sen

unread,
Mar 9, 2025, 10:34:38 PM3/9/25
to physionet-challenges
Hello,

I would like to clarify the organization of the input data for training the machine learning model. If I understand correctly, the input data is in .hea format, and the train_model.py script operates using the data_folder as its input. Within this data_folder, all files are stored in .hea format.

Could you confirm whether the training data is already separated into training and validation sets within the data_folder? Or can we handle this separation ourselves using the files inside the folder?

Thank you for your time and assistance. I look forward to your response.

Best
Ahmet SEN

PhysioNet Challenge

unread,
Mar 9, 2025, 10:38:20 PM3/9/25
to physionet-challenges
Dear Ahmet,

The data are in a WFDB format, which includes a WFDB header file and a WFDB signal file signal file for each ECG recording. The WFDB header file is a plain text file with a .hea extension, and the WFDB signal file is a binary data file; by default the data preparation scripts produce WFDB signal files with a .dat extension, but an optional command-line argument produces WFDB signal files with a .mat extension. The WFDB header file describes the WFDB signal file and provides metadata about the recording and patient encounter. Please see more information here:

The training data are public, and the validation and test data are hidden. We will train your model on the training set and evaluate your trained model on the validation set (and later, at the end of the Challenge, on the test set). Therefore, you do not need to separate the training data, and, in general, the teams will not have access to the validation set or test set:
https://physionetchallenges.org/2025/#data

However, you are, of course, welcome to treat data from different sources differently when training your model, for example, or use a subset of the data to choose parameters for your model.
Reply all
Reply to author
Forward
0 new messages