Dear Kelvin,
We are using the 500Hz version of the PTB-XL dataset. Please see this part of the README for more information, and for the code that we use to generate the training set:
https://github.com/physionetchallenges/python-example-2025?tab=readme-ov-file#how-do-i-create-data-for-these-scripts
We generated the validation and test sets in similar ways, but they
include data from different sources that are not included in the
training set.
Also, you wrote that you are creating pre-defined training/validation/test splits of the training data. Teams are welcome to split the training set in any way that they would like, and to leverage the mixture of weak and strong labels in the training set, but please ensure that your code also learns from the training set, and please be careful about how you describe these splits in your paper: they are not "the" hidden validation and test sets for the Challenge.
Best,
Matt
(On behalf of the Challenge team.)
Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at
physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.