Labels - scored vs. unscored

188 views
Skip to first unread message

David Hudak

unread,
Feb 8, 2021, 7:28:28 AM2/8/21
to physionet-challenges

Hi everyone,

I would like to ask you some questions about the PVC label:

The label PVC (Premature Ventricular Contractions) from the database CPSC2018 was in both  Physionet Challeneges (2020, 2021) mapped to the unscored SNOMED code 164884008 (VEB – ventricular ectopics) and not to the scored SNOMED code 427172004 (PVC - premature ventricular contractions). Wouldn’t it be more correct to declare the code 164884008 VEB as scored and join it to the group of codes 427172004 PVC and 17338001 VPB. It covers 700 ECGs from CPSC2018 and 1154 ECGs from PTB-XL. It could strongly impact the resulting score.

Thanks

David

PhysioNet Challenge

unread,
Feb 8, 2021, 7:33:02 AM2/8/21
to David Hudak, physionet-challenges
Dear David,

To address your question on PVC vs VPC vs VEB labels, please see the response in our Google group from last year,  copied below, and the announcement from June 18 last year. 

-Gari

On Wed, Jul 15, 2020, 11:55 PhysioNet Challenge <chal...@dbmi.emory.edu> wrote:
Dear Martin,

Thank you for your question, you raised a good point. Please see the announcement on June 18th (https://groups.google.com/g/physionet-challenges/c/eNc26q2luM4/m/gLUu0vRlBgAJ), but basically:

Each database is labelled using a different ontology, or subset of terms in an ontology (or sometimes no ontology - just free text). We therefore had to make a call about how to map these. For example, we have the following four labels for ventricular ectopic beats:
 
Description, SNOMED Code, Abbreviation
premature ventricular complexes,164884008,PVC
premature ventricular contractions,427172004,PVC
ventricular ectopic beats,17338001,VEB
ventricular premature beat, 17338001, VPB

You'll note that while we have chosen to retain the distinction between these in terms of SNOMED codes,  (although we have merged PVCs, because we could really see no reason they had two separate codes), in the scoring matrix they carry the same weight, and mixing them up doesn't cost you any points. You may then ask, 'why not merge them all in the labelling'? Well that's a question you have to answer for yourself. You are certainly welcome to do that - but you may not want to. You may note that only VPB indicates the temporal location of the beat relative to the preceding normal beat. This may, or may not, affect your algorithm, depending on how you write your code. You may or may not want it to affect your algorithm - the relative timing of beats certainly gives you information! We have therefore tried to provide you with as much useful information as possible, without overwhelming you with a complete data dump.

Let me know if you have any questions,

Best,

Erick

(On behalf of the Challenge team.)


Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email chal...@physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually. 


On Wed, Jul 15, 2020 at 10:50 AM martin.baumgartner via physionet-challenges <physionet-...@googlegroups.com> wrote:
Dear Challenge-Organizers,

when examining the mapping of scored labels, we found that in the first dataset (CPSC) no records are labelled as PVC ("premature ventricular contractions"). This is confusing as during the unofficial phase, this dataset included exactly 700 records with this label. 
We found in the mapping of unscored labels, that for this dataset 700 records are now labelled as "ventricular ectopics", which is reasonable as both diagnoses describe the same abnormality. However, models recognising them are not benefitting from correct classification of this pathology anymore as this class is in the unscored list although being almost identical to the scored diagnosis PVC. 

If this change from PVC to ventricular ectopics is intended, it might be helpful to add ventricular ectopics to the scored mapping and map it to the PVC and VPB classes.

Kind regards,
Martin 

-- 
You received this message because you are subscribed to the Google Groups "physionet-challenges" group.
To unsubscribe from this group and stop receiving emails from it, send an email to physionet-challe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/physionet-challenges/f27d48cc-7f80-4c45-8e10-489182bd4b3fo%40googlegroups.com.

(On behalf of the challenge team.)

https://physionetchallenges.github.io/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually. 
   

--
You received this message because you are subscribed to the Google Groups "physionet-challenges" group.
To unsubscribe from this group and stop receiving emails from it, send an email to physionet-challe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/physionet-challenges/4e5715c2-3193-47fb-9a41-b097a7f7f18en%40googlegroups.com.

David Hudak

unread,
Feb 10, 2021, 10:47:04 AM2/10/21
to physionet-challenges
Thank you.
I am sorry for the repeated question.

But I would like to ask you about the test set. ECGs from Georgia and CPSC2018 (in training set) with the label 426783006 (NSR – sinus rhythm) have no other label = they represent "normal" ECGs without anomalies.

Is this true also for ECGs with NSR from the other part of the test set (10 000 ECGs)?

PhysioNet Challenge

unread,
Feb 10, 2021, 10:54:45 AM2/10/21
to David Hudak, physionet-challenges
Dear David, 

You may find the response to a similar question on the forum about sinus rhythms from last year:

As noted on our website, the test data are drawn from the same sources as some of the training databases and a new database never before released:

"The fifth source is an undisclosed American database that is geographically distinct from the Georgia database. The source contains 10,000 ECGs (all retained as test data)."

I think you are referring to this latter database. It is likely that the labels are slightly different in this database.  (We aren't going to confirm or deny this, because this is part of the challenge.)
All the labels listed on our website may be present in this hidden database, and no unlisted labels are present.

It's important to note that, as with all the non- research databases,  the labels are assigned in the course of standard clinical assessment,  and we cannot be 100% sure of differences in the local practices at that particular hospital.  This is exactly the problem we encounter in the real world.  You develop an algorithm, and someone uses it on data drawn from a system and a population not perfectly represented in training. (Twelve lead ECGs are probably one of the most standardized  hardware and clinical practice combinations you'll encounter in healthcare.) Labels are still 'noisy' (in many ways). This is part of the challenge.  You have to use the (many) databases provided to work out how to deal with the variation in labelling practices, variation in skill of the labelling team,  types of hardware,  variations in populations, etc.  

That said, if you encounter the label 426783006 (NSR – sinus rhythm) and no other label, it is likely that the ECG is  "normal", without significant anomalies. Conversely, if you do not see this label,   normality is unlikely (but not out of the question). 
for a list of scored labels. 

Of course, it's more nuanced than this. As with all data, you need to dive deeply into the definitions of each label (especially the scored labels).  For example,  sinus arrhythmia can be a normal change in heart rate due to respiration.  For that reason, and other reasons related to downstream consequences of misdiagnoses, you are not fully penalized for mixing up these classes in the scoring function. 

It's important to note that our Challenges differ from most public data science competitions in many respects, and are generally closer to reality in terms of the data and scoring, and hence the final algorithms.  We don't sweep the dirt under the carpet (although we've spent a long time preparing the data to ensure the problem is tractable and meaningful). This means teams generally have to make more effort in understanding the nuances of the data than in most data science events. It's one reason why we provide raw data, rather than a hand-selected set of features, or heavily (post-acquisition) filtered data. 

I hope this detail explains our choices, and why we don't over-curate the data and aren't as specific about the test data as you may want. 

All the best,

Gari
---
(On behalf of the Challenge team.)

Sebastian Wegener

unread,
Feb 17, 2021, 1:51:36 AM2/17/21
to physionet-challenges
I have a different question but I think it fits very well in this thread. Do the 84 labels in the "dx_mapping_unscored.csv" file influence scoring at all? This may seem trivial, but I wasn't sure from the scoring metric. 

Thank you for hosting the challenge.

Sebastian

physionet-challenges

unread,
Feb 17, 2021, 1:54:00 AM2/17/21
to physionet-challenges
Dear Sebastian,

The unscored labels from "dx_mapping_unscored.csv" does not affect the scoring metric. And only the labels in "dx_mapping_scored.csv"  and “weights.csv” are important for the Challenge scoring.

You can find more information and discussion on the forum from July 2020:

https://groups.google.com/g/physionet-challenges/c/XJMlky6yPDM


Best,

Nadi
(On behalf of the Challenge team.)
https://physionetchallenges.github.io/

Please post questions and comments in the forum. However, if your question reveals information about your entry, then please email challenge at physionet.org. We may post parts of our reply publicly if we feel that all Challengers should benefit from it. We will not answer emails about the Challenge to any other address. This email is maintained by a group. Please do not email us individually.
Reply all
Reply to author
Forward
0 new messages