Dear Challengers,
The final piece of the puzzle has now been posted which we hope answers many of your questions.
First, we have posted an updated list of SNOMED CT codes and diagnoses to describe all of the data now posted. This
list has been split into two CSV files:
one list for diagnoses that are
included in the new scoring function, and
another list
for diagnoses that are essentially ignored during scoring. (We have
provided these non-scored classes for completeness, so all of the
diagnoses in the training data belong to one of these two lists, and in
case you find them useful to develop a 'non-class' group.) The scored
diagnoses were chosen based on prevalence of the diagnoses in the
training data, the severity of the diagnoses, and the ability to
determine the diagnoses from ECG recordings.
Second,
we have posted a Python implementation of a new scoring function
here. The new
scoring function (s, defined below) awards partial credit to
misdiagnoses that result in similar treatments or outcomes as the true
diagnosis or diagnoses as judged by our cardiologists. This scoring function reflects
the clinical reality that some misdiagnoses are more harmful than others
and should be scored accordingly. Moreover, it reflects the fact that
confusing some classes is much less harmful than confusing other
classes. It is defined as follows:
Let
C = {c_i} be a collection of diagnoses. We compute a multi-class
confusion matrix A = [a_{ij}], where a_{ij} is the number of recordings
in a database that were classified as belonging to class c_i but
actually belong to class c_j. We assign different weights W = [w_{ij}]
to different entries in this matrix based on the similarity of
treatments or differences in risks. The score s is given by s =
\sum_{ij} w_{ij} a_{ij}, which is a generalized version of the
traditional accuracy metric. The score s is then normalized so that a
classifier that always outputs the true class(es) receives a score of 1
and an inactive classifier that always outputs the normal class receives
a score of 0.
As
always, we welcome feedback on this novel scoring function. While we
don't claim that the *exact* weights are 'optimal', they generally reflect
the outcomes in which we are interested, and avoid the arbitrary average
class accuracy.
Unless we find a bug in the scoring function,
it will not change. There will also be no more updates to the
training data from this point onwards.
Again,
apologies that this took so long, but you can see what an enormous lift this
was, to not only create the largest data repository of its kind (over
40,000 public 12-lead ECGs, and much more in test data, drawn from five
independent sources), but also to create an entirely new scoring
function. We expect to reopen submissions in the coming days.
Best,
The Challenge Organizers
https://physionetchallenges.github.io/Please
post questions and comments in the forum. However, if your question
reveals information about your entry, then please email challenge at physionet.org.
We may post parts of our reply publicly if we feel that all Challengers
should benefit from it. We will not answer emails about the Challenge
to any other address. This email is maintained by a group. Please do not
email us individually.