Naming Convention for PTB Data

75 views
Skip to first unread message

Kelvin Khang Dinh Nguyen

unread,
Jul 14, 2025, 8:39:03 AMJul 14
to physionet-challenges
Hi, 

As I have pre-defined train/val/test splits based on the training data, I was wondering how the PTB data is named, since my PTB files have the "_hr" suffix to them, as I use records500, but I'm not sure which ones the training code uses. Thank you!

PhysioNet Challenge

unread,
Jul 14, 2025, 8:41:03 AMJul 14
to physionet-challenges
Dear Kelvin,

We are using the 500Hz version of the PTB-XL dataset. Please see this part of the README for more information, and for the code that we use to generate the training set:
https://github.com/physionetchallenges/python-example-2025?tab=readme-ov-file#how-do-i-create-data-for-these-scripts

We generated the validation and test sets in similar ways, but they include data from different sources that are not included in the training set.

Also, you wrote that you are creating pre-defined training/validation/test splits of the training data. Teams are welcome to split the training set in any way that they would like, and to leverage the mixture of weak and strong labels in the training set, but please ensure that your code also learns from the training set, and please be careful about how you describe these splits in your paper: they are not "the" hidden validation and test sets for the Challenge.

​​Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.
Reply all
Reply to author
Forward
0 new messages