2022 PhysioNet Challenge scores released

883 views

Skip to first unread message

PhysioNet Challenge

unread,

Sep 12, 2022, 7:56:19 PM9/12/22

to physionet-challenges

Dear Challengers,

Thank you for another successful Challenge! We were happy to see many of you in person at CinC 2022 in Tampere, Finland, and we hope that we can see even more of you next year at CinC 2023 in Atlanta, Georgia, USA – hosted by us!

This announcement has important information about the scores on the test data and the updates to your papers (which you need to implement to be ranked). Please read it carefully.

Test scores

We have posted preliminary scores and rankings on the test data: https://physionetchallenges.org/2022/results/

Please note that there are five tables on this page: a table with a summary of the teams, two tables for the murmur detection task, and two tables for the clinical outcome identification task. For each task, there is an official ranked list of teams sorted by test score, and an unofficial unranked list of the other teams in alphabetical order. The teams in the second list were not ranked because they did not meet one or more of the Challenge rules, including non-functioning or non-reusable training code, failure to register for CinC or to upload a CinC preprint by the deadline.

Please remember that we used the weighted accuracy metric to score and rank the murmur detection task and the cost metric to score and rank the clinical outcome identification task, but we included additional metrics in these tables, and you are more than welcome to include and compare them in your papers as long as you clearly include the official metrics (see below).

Please contact us by Friday, 16 Sept. 2022 at 23:59 GMT if you believe that your team is on the wrong list, or if you believe that any of the information about your team is incorrect. We will update the results afterwards. Please note that your team may move from an official entry to an unofficial entry if you do not adhere to the instructions below on finalizing and uploading your papers by the deadline.

(A closer look at the rounded scores revealed another (joint) winner – congratulations! Please look forward to an email from us. :))

Final papers and deadline
Please update your four-page conference papers to include the test scores, update your discussion and conclusions, and address any issues with your preprint. Please upload your final papers on Softconf by 23:59 (your local time) on 24 September 2022:
https://www.softconf.com/m/cinc2022/

The above tables include the scores for your chosen models on the training, validation, and test sets. Please note that the "validation" score is the intermediate score that the teams received during the official phase of the Challenge before we ran your final selected code on the test set. Please include the training, validation, and test scores in your papers using the format described in the paper template:
https://physionetchallenges.org/2022/papers/

If you did not receive a final test score, please be clear about that in your paper. Articles that refer to a validation score or “local” test score as if it were the final metric by which to be judged will not be eligible for publication.

We review each paper and frequently need to ask teams to make corrections that we’ve already requested! Please read the paper template for instructions, including the following items:

Cite the Challenge description and data correctly using the references in the CinC template. Specifically they are:
1) Reyna, M. A., Kiarashi, Y., Elola, A., Oliveira, J., Renna, F., Gu, A., Perez-Alday, E. A., Sadr, N., Sharma, A., Mattos, S., Coimbra, M. T., Sameni, R., Rad, A. B., Clifford, G. D. (2022). Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022. medRxiv, doi: 10.1101/2022.08.11.22278688
2) Oliveira, J., Renna, F., Costa, P. D., Nogueira, M., Oliveira, C., Ferreira, C., … & Coimbra, M. T. (2022). The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification. IEEE Journal of Biomedical and Health Informatics, doi: 10.1109/JBHI.2021.3137048.
We ask you to cite these articles so we can measure the impact of the Challenge and report this to those that sponsor us. If you fail to cite this reference article, our impact is under-reported and funding for future Challenges are much less likely. We appreciate your help with this. It also prevents authors from incorrectly describing the data (a problem we often see).
Do not cite the Challenge websites … and avoid citing websites in general.
Cite your other references correctly: https://ieeeauthorcenter.ieee.org/wp-content/uploads/IEEE-Reference-Guide.pdf. We’ve seen many sloppy references where authors’ names are butchered, abbreviations and journal names are uncapitalized, and important information (like volume or page numbers) are missing.
Try to avoid citing preprints - look for the journal article that the authors finally published. This will be more accurate and more balanced, since it has gone through peer-review. If there’s no journal article following the preprint, that may be because the authors were unable to find a journal that would publish it. Be skeptical of the claims and work in the preprint.
Present your results in your abstract and results table in the same way we did in the CinC template for consistency with other teams. This makes comparisons easier and ensures you don’t miss key information.
Be clear about your data sets (train, cross-validation on training data, validation, test) and metrics/scores. Include your scores and rankings on the validation and test data - you don't strictly need to provide training or cross validation scores, but if you do, make sure you identify them next to the real test data so there’s no misinterpretation. If you did not receive scores on the validation or test data, then say so. Do not describe your “local test set”, which is just confusing. The only test set in the context of the Challenge is the one on which we ran your final code submission.
Do not make misleading or inaccurate statements about your results. In particular, do not claim an inaccurate ranking, or report inaccurate statistics. If you are in the unofficial list, the code is not ranked, and you should just say you were not ranked. Do not say where you ‘would’ have been, had you been ranked. This is misleading and confusing.

Teams that are unable to address these issues by the deadline are in danger of having their papers rejected and being removed from both the ranked and unofficial unranked lists, so please review your papers carefully before you resubmit them!

Focus Issue
As we announced on the Challenge forum before the conference, we are asking teams to submit their extended work as preprints to medRxiv for peer pre-review. The ‘best’ pre-prints will be invited to submit to focus issue on this year’s Challenge:
https://groups.google.com/g/physionet-challenges/c/IjX_GdhvDrc

Another Shot at the Test Data!
We encourage you to make improvements to your code in light of what you have learned last week at the conference. If you do so, and send us a draft of the extended preprint describing the modifications, we will attempt to run your new code one more time. If you include this new approach (and score) in the preprint, please ensure you identify it as a post-Challenge submission, and compare it to your Challenge submission. You may do this before or after you post your medRxiv preprint (although we ask you to modify the preprint on medRxiv if you do it after posting your first version there).

Parting Thoughts
We look forward to seeing your revised papers (and code) and hope that you will consider submitting extensions of your work to the focus issue. Congratulations to the winners, thank you all for participating, and we hope that you will participate again in the next Challenge!

Best,
The Challenge team

https://PhysioNetChallenges.org/

https://PhysioNet.org/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

PhysioNet Challenge

unread,

Sep 13, 2022, 11:45:59 PM9/13/22

to physionet-challenges

Dear Challengers,

We've received many emails from teams over the past 24 hours with questions about your scores, rankings, and eligibility.

If you believe that some of the information about your team is incorrect, then please contact us. We have already had at least one team with different spellings of its name that made the team's participation harder to track, and matching the names allowed us to update the score sheets to rank them. (Please be creative and consistent with your team names for next year!)

Many of you noticed that we identified several teams as not having working training code. (This corresponds to the "Working training code?" columns in the team summary table on the results site.) We ran the training code for each entry before scoring the models from the training code on the training, validation, and test sets, so you may be curious about what this means. By "working training code", we meant that we could run your training code on data, even if the data were not exactly the same as the provided training set -- a more accurate description might have been "reusable training code".

To check for "working" or "reusable" training code, we modified many of the labels for the recordings (but not the recordings themselves) in the training set, e.g., switched "Present" with "Absent" for some of the recordings, reran your training code, and scored the resulting models:

If your training code crashed when we modified the labels, then it appeared that your training code was very sensitive to the choice of training set (or had intentionally or unintentionally hard-coded some aspect of the training set).
If your training code did not crash, but it produced the same models and the same scores whether we used the original or modified training set, then it appeared that your training code was just loading a pre-trained model.
If your training code did not crash and produced different models and scores with the original and modified labels, then it appeared that your training code worked -- although some models produced better scores with the modified labels!

We understand that this is an unpleasant surprise for several teams, and it was a surprise to us that several teams did not have working or reusable training code. We will follow up this with the teams that are contacting us to make sure that we did not disqualify your teams for the rankings accidentally. We believe that usable and reusable training code that we can rerun on new data (or even minimally changed data, as was the case for this experiment) is an essential part of a working entry to the Challenge:

https://groups.google.com/g/physionet-challenges/c/Ui0cj7Z4B6M/m/CXSeSPvyBgAJ

As always, thank you for helping us to improve the Challenges, and for bearing with us when some of the improvements are (hopefully temporarily) frustrating. (We will provide earlier feedback next year to reduce the surprises!)

Best,

Matt

(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

xnej...@gmail.com

unread,

Sep 16, 2022, 3:41:41 PM9/16/22

to physionet-challenges

Dear challenge organizers,
I would like to express my concerns related to the newly introduced criteria to assess the "working code" after the challenge is finished.

I completely understand that providing reusable code is a part of the challenge. However, such errors should have been detected automatically during the submission and validation phase. In case of an error, teams should be notified before the challenge deadline to correct the code to the appropriate format.

I believe there are multiple reasons, why provided training codes might crush because of unintentional errors in the code. Simply because nobody was expecting of using the code with different input data or labels.

Furthermore, I believe that evaluation of the algorithms based on performance with different input labels leads to the elimination of algorithms with heuristic rules that do not use any machine learning methods.

For mentioned reasons, we believe it would be fair if you could share your testing code and pipeline with us. It would certainly help the community to improve the submitted codes.

Petr Nejedly

Institute of Scientific Instruments of the CAS, v. v. i.
Czech Republic

Dne středa 14. září 2022 v 5:45:59 UTC+2 uživatel PhysioNet Challenge napsal:

PhysioNet Challenge

unread,

Sep 16, 2022, 3:47:23 PM9/16/22

to physionet-challenges

Dear Challengers,

We have received many, many emails (both public and private) from you over the past few days about our tests for working training code. We understand that many teams were surprised about the results of these tests. We were surprised by them, too.

Several teams have suggested (1) that we should have been more transparent about how we would test your code, (2) that we should have automatically detected these mistakes in your code, (3) that no one expected that we would run your training code with different data or labels, and (4) that these tests unfairly excluded algorithms that do not learn from the data. We wanted to address some of these points.

To be clear, aside from necessary computational resource constraints, we impose very few limits on how teams can train their algorithms on the training set. We ask that you load the training set from the provided training folder, train your models on the training set, and save the models to the provided model folder. If teams hard-code information about the training set in their code or include additional files with their code that ignore the provided training set, then the code simply does not work as expected, or sometimes at all.

Indeed, the point of these simple tests is to reveal which teams have attempted to cheat by hard-coding data in their code. Telling you exactly how we do this, and automatically testing it for you, makes working around our tests much easier. We recognize that most people don’t want to cheat, and this makes it harder for you, but we think that you would be outraged if a cheating team won. Finding the balance within the time and resources that we have is not easy. You can imagine that, with the number of entries and complexity of code that we receive, it’s impossible for us to exhaustively test and debug every entry. Therefore, we apply tests that seem reasonable, and are tractable in the time we have. The point of the Challenge is to create reusable code, and hard-coding data in your training code makes it practically useless to others (unless they can spot the error, and, even then, it requires a huge effort to debug). The concepts of reusability and generalizability of the code is a fundamental requirement of the Challenge, and one of the things that separates the PhysioNet Challenges from other public competitions - we push you to make the code as useful as possible.

However, we want to recognize that the Challenge teams are the most important part of this public event, and be as accommodating as possible to those that work within the spirit of the Challenge. In the spirit of recognizing the hard work of all participants, we will be adding teams that failed the “working training code” tests back to the official list, but we will not be changing the prizes, since we feel that those that created the fully re-usable code deserve the prizes. However, we want to be responsive to your requests, and for future Challenges, we will try to more clearly communicate and require our expectation for robust and reusable training code. We hope that teams will embrace this opportunity to write even higher-quality code. We will update the team summaries and score tables shortly.

We hope that both those that managed to make generalizable code, and those that managed to make entries that worked only in a specific way, find this approach fair. The time has come for us to release the final list, and we do not wish to delay further. We have another Challenge to prepare!

We want to thank *all* the participants of the Challenge - we really mean it when we say that you are the most important part of the event. Science is a group activity, and we build off each other’s work. Don’t forget - you have one more chance to have the highest score in this Challenge! We will be producing a focus issue on the Challenge, and if you wish to submit an updated version of your code for evaluation for the focus issue, then please send us a draft of the article (or link to your medRxiv extended preprint describing the updated approach) together with a link to the code, and we will attempt to run your code once more. (And we will perform the same checks for generalizability!)

Thank you all again for your contributions this year!

Best,
Gari and Matt

(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

PhysioNet Challenge

unread,

Sep 18, 2022, 1:26:39 PM9/18/22

to physionet-challenges

Dear Challengers,

We have updated the team summaries and scores on the Challenge website:
https://physionetchallenges.org/2022/results/

Please read the previous announcement for information about these changes, and please update your manuscripts with these updated results, including your scores and ranks on the test set. Please also check the CinC proceedings paper template for important information about how to prepare your final papers (so that we don't need to ask you for last-minute changes):
https://physionetchallenges.org/2022/papers/#templates

Some of the teams asked us to share the data and code that we used to check their training code. Instead of sharing the exact data and code, I would rather share a simple but more general process for checking your training code and encourage teams to think about how they can write more usable and reusable code:

Change the data and/or labels in the training set. Does your code work with missing, unknown, or non-physiological values in the data? Does your code work if you change the prevalence rates of the classes or remove one of the classes?
Change the size of the training set. You can extract a subset of the training set or duplicate the training set. Does your code work with a training set that is 15% or 150% of the size of the original training set?
Run your training code on the modified training set. If your training code fails, then your code is too sensitive to the changes in the training set, and you should update your code until it works as expected.
Score the resulting model on part of the unmodified training set – ideally, data that you did not use to train your model. If your code fails, or if the model trained on the modified training set receives the same scores or almost the same scores as the model trained on the unmodified training set, then your training code didn’t learn from the training set, and you should update your code until it works as expected.

Again, this is a simplified process, and we may change how we stress test your code in future tests (such as randomizing the labels), so please think about how you can ensure that your code isn’t dependent on a single set of data and labels or a single test for robustness. Of course, you should also try similar steps to check the rest of your code as well.

All of this work is in service of protecting your scientific contributions over the course of the Challenge, and we appreciate, as always, your feedback and help.

Best,
Matt
(On behalf of the Challenge team.)

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email info at physionetchallenge.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Reply all

Reply to author

Forward

0 new messages