Evaluation of NONE

33 views

Skip to first unread message

S Baez Santamaria

unread,

Jun 30, 2022, 1:57:20 PM6/30/22

to LM-KBC

Dear organizers,

We have noticed that the evaluation script does not treat responses containing 'NONE' fairly. For example, it processes it like every other string, which results in it changing to 'none' and thus not being compared to the ground truth. We have made modifications to change this behavior, but we want to confirm that this is ok since you will be evaluating the predictions.

Furthermore, we want to bring to your attention that BERT does not predict the NONE token, which might affect the f1 score. Particularly because empty predictions are penalized with an f1 score of 0, even if the gold answer is None, is this intended behavior?

Best,
Selene Baez Santamaria

LM-KBC

unread,

Jun 30, 2022, 4:23:34 PM6/30/22

to LM-KBC

Hi Selene,

Thanks for the pointer on the 'NONE' type. We will fix that.

Yes, LM's don't directly predict NONE which is part of the challenge. Participants need to design a solution that can handle such a case (e.g., using some form of model's likelihood or uncertainty). The solution should predict 'NONE' when the gold answer is 'NONE' for a higher f1 score.