Outputs of the Models

80 views
Skip to first unread message

Kevin Tian

unread,
Jun 10, 2024, 9:00:58 PMJun 10
to physionet-challenges
Hello, I have some questions about what the models should produce. 

For the classification model: When we prepared the data and got the list of classifications, we noticed that there was an additional blank category. Should we remove this blank category from the classification list?

For the digitization model: Should the output be a header and signal file with a fixed time length? If not, where can we find more information on the output format and expectations?

Thank you,
Kevin Tian

PhysioNet Challenge

unread,
Jun 10, 2024, 9:06:36 PMJun 10
to physionet-challenges
Hi Kevin,

Some of the records are unlabeled. This happens for various reasons for different datasets. For the training set, some of the PTB-XL classes do not match any of the Challenge classes, causing some of the records to be unlabeled. You can do whatever you would like when training your model, but when we evaluate your classification model's performance on the validation and test sets, we will not score unlabeled records.

(Similarly, when we evaluate your digitization model's performance on the validation and test sets, we will not score records without time series or waveforms.)

Your digitization model should output a WFDB header file and a WFDB signal file for each record. The header file should include the signal duration, sampling frequency, channel names, etc. that are needed to interpret the signal file, but we will share this information with your inference code, i.e., the run_model script, using a partially completed header file. If you run the remove_hidden_data.py script, e.g.,

 python remove_hidden_data.py \
     -i ptb-xl/records500_hidden/00000 \
     -o ptb-xl/records500_hidden/00000 \
     --include_images

then you can see what your inference code will see. This section of the README and website and the inputs and outputs of the commands for examples should make the task a little more concrete:
https://github.com/physionetchallenges/python-example-2024?tab=readme-ov-file#how-do-i-create-data-for-these-scripts

This page about WFDB header files may also help. For example, the top row of the header file gives the record name, the number of channels, the sampling frequency, and the number of samples in the signal data:

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.
Reply all
Reply to author
Forward
0 new messages