2022 - Proposed metric

David Hudak

unread,

Feb 10, 2022, 4:49:32 PM2/10/22

to physionet-challenges

Dear PhysioNet team,

we have one question regarding the scoring metric. From the point of view of the described processes and the proposed metric, it seems to us that the resulting process and the value of the metric are the same if the classifier for the given PCG predicts the class Positive even if it predicts the class Unknown. Thus, with the proposed metric, the Positive and Unknown classes can be combined into one class and the competition is essentially a binary classification?

PhysioNet Challenge

unread,

Feb 10, 2022, 4:52:37 PM2/10/22

to physionet-challenges

Dear David,

Thanks for this question. Yes, we agree with your team's assessment: under the proposed scoring metric, the "Present" and "Unknown" classifier outputs could be used interchangeably without affecting the score, so the problem was effectively a binary classification problem. We did this intentionally for the initial scoring metric, but perhaps the unasked part of your question was whether, and how, the scoring metric should differentiate between these two cases. This is something that we've been thinking about, too.

In response, we've updated the proposed scoring metric so that "Present" cases are still referred to a general practitioner, but "Unknown" cases are now referred directly to a specialist. With this change, your algorithm can (1) refer patients for screening by a general practitioner; (2) refer patients directly for screening by a specialist, which resolves "Unknown" cases but has a higher screening cost; or (3) not refer patients for screening at all, which incurs no screening cost but could result in patients with untreated murmurs.

What do you and others think? We've updated the scoring code and Challenge website with the changes. More questions and feedback are welcome!

Best,

Matt

(On behalf of the Challenge team.)

https://PhysioNetChallenges.org/
https://PhysioNet.org/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Andrew McDonald

unread,

Mar 15, 2022, 10:14:35 AM3/15/22

to physionet-challenges

Dear Matt,

Thanks for this explanation of the proposed scoring metric. It's a very interesting way to assess classification performance compared to traditional sensitivity/specificity metrics.

Would you be able to share the value for c0 (the total cost without algorithmic prescreening) for the validation data? This would help put the leaderboard in context - the current weightings in the metric put a very high cost on false negatives, which makes an algorithm that just predicts all positives do quite well.

Best wishes,

Andrew

PhysioNet Challenge

unread,

Mar 15, 2022, 10:20:41 AM3/15/22

to physionet-challenges

Dear Andrew,

Yes, we're trying to better capture the clinical utility of algorithmic prescreening. Feedback on this year's scoring metric is more than welcome! We will revisit the scoring metric during next month's hiatus between the unofficial and official phases, so any feedback during the unofficial phase is especially helpful.

For the current scoring metric, the mean cost without algorithmic prescreening (i.e., the total cost c_0 divided by the number of patients) is 512 for the training set and a similar value for the validation set.

For comparison, the lowest possible mean cost with algorithmic prescreening (i.e., the total cost c_1 divided by the number of patients) is 311 for the training set and a similar value for the validation set.

These baseline scores are similar across the different partitions of the data because we partitioned the data so that the training, validation, and test sets have similar class prevalence rates.

As you noted, the current scoring metric does place a much higher cost on false negatives than on false positives, so a classifier that returns all positives (murmur present) does have a relatively low score. However, it is certainly possible for algorithms to achieve a lower/better cost.

Best,
Matt
(On behalf of the Challenge team.)

https://PhysioNetChallenges.org/
https://PhysioNet.org/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.

Reply all

Reply to author

Forward