New Challenge scoring function

646 views

Skip to first unread message

PhysioNet Challenge

unread,

Jun 24, 2020, 8:01:12 PM6/24/20

to physionet-challenges

Dear Challengers,

The final piece of the puzzle has now been posted which we hope answers many of your questions.

First, we have posted an updated list of SNOMED CT codes and diagnoses to describe all of the data now posted. This list has been split into two CSV files: one list for diagnoses that are included in the new scoring function, and another list for diagnoses that are essentially ignored during scoring. (We have provided these non-scored classes for completeness, so all of the diagnoses in the training data belong to one of these two lists, and in case you find them useful to develop a 'non-class' group.) The scored diagnoses were chosen based on prevalence of the diagnoses in the training data, the severity of the diagnoses, and the ability to determine the diagnoses from ECG recordings.

Second, we have posted a Python implementation of a new scoring function here. The new scoring function (s, defined below) awards partial credit to misdiagnoses that result in similar treatments or outcomes as the true diagnosis or diagnoses as judged by our cardiologists. This scoring function reflects the clinical reality that some misdiagnoses are more harmful than others and should be scored accordingly. Moreover, it reflects the fact that confusing some classes is much less harmful than confusing other classes. It is defined as follows:

Let C = {c_i} be a collection of diagnoses. We compute a multi-class confusion matrix A = [a_{ij}], where a_{ij} is the number of recordings in a database that were classified as belonging to class c_i but actually belong to class c_j. We assign different weights W = [w_{ij}] to different entries in this matrix based on the similarity of treatments or differences in risks. The score s is given by s = \sum_{ij} w_{ij} a_{ij}, which is a generalized version of the traditional accuracy metric. The score s is then normalized so that a classifier that always outputs the true class(es) receives a score of 1 and an inactive classifier that always outputs the normal class receives a score of 0.

As always, we welcome feedback on this novel scoring function. While we don't claim that the *exact* weights are 'optimal', they generally reflect the outcomes in which we are interested, and avoid the arbitrary average class accuracy.

Unless we find a bug in the scoring function, it will not change. There will also be no more updates to the training data from this point onwards.

Again, apologies that this took so long, but you can see what an enormous lift this was, to not only create the largest data repository of its kind (over 40,000 public 12-lead ECGs, and much more in test data, drawn from five independent sources), but also to create an entirely new scoring function. We expect to reopen submissions in the coming days.

Best,

The Challenge Organizers

https://physionetchallenges.github.io/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

PhysioNet Challenge

unread,

Jun 26, 2020, 2:49:10 PM6/26/20

to physionet-challenges

Dear Challengers,

We have received multiple requests for a MATLAB implementation of the new scoring function, which we have uploaded as well:

https://github.com/physionetchallenges/evaluation-2020

We support the same programming languages that we supported during the unofficial phase of the Challenge, and the Python and MATLAB versions of the new scoring function provide the same results.

Best,

Matt

(On behalf of the Challenge team.)

https://physionetchallenges.github.io/
https://physionet.org/

Reply all

Reply to author

Forward

0 new messages