Test subject entities released

LM-KBC

unread,

Jul 11, 2022, 2:04:25 AM7/11/22

to LM-KBC

Dear participants,

The test subject entities have been released here: https://github.com/lm-kbc/dataset/blob/main/data/test.jsonl

Submit your predictions on the CodaLab portal (https://codalab.lisn.upsaclay.fr/competitions/5815) to get the evaluation.

Good luck!

Best,
Sneha

Thiviyan Thanapalasingam

unread,

Jul 13, 2022, 8:47:16 AM7/13/22

to LM-KBC

Hi Sneha,

We just tried to do a submission, but received the following error:

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Traceback (most recent call last): File "/tmp/codalab/tmpNup3fJ/run/program/evaluate.py", line 144, in main() File "/tmp/codalab/tmpNup3fJ/run/program/evaluate.py", line 28, in main scores_per_sr_pair = evaluate_per_sr_pair(preds_fp, gts_fp) File "/tmp/codalab/tmpNup3fJ/run/program/metric.py", line 167, in evaluate_per_sr_pair p = precision(preds, gts) File "/tmp/codalab/tmpNup3fJ/run/program/metric.py", line 80, in precision if is_none_preds(preds): File "/tmp/codalab/tmpNup3fJ/run/program/metric.py", line 36, in is_none_preds list(preds)[0].lower() in {"", "none", "null"} or AttributeError: 'NoneType' object has no attribute 'lower'

It seems that the evaluation script on CodaLab has trouble processing None. In our predictions.jsonl file, we represent NULL as empty strings ("") but the evaluation script interprets them as None.

We think that the conditions on line 36 and 37 have to be switched around: https://github.com/lm-kbc/dataset/blob/b118bb8b135bca34f9d584942a6c299d63c456cc/evaluate.py#L36. Like this:

list(preds)[0] is None)) or list(preds)[0].lower() in {"", "none", "null"}

Thank you!

Best,

Thiviyan

Phong Nguyen

unread,

Jul 13, 2022, 9:00:25 AM7/13/22

to LM-KBC

Great catch, Thiviyan. Thank you!

You are right that line 36 and 37 should be switched. We have made the change (both in the GitHub repo and CodaLab competition). I re-evaluated your submission, you can go and see your results (some details are available too, see the Detailed Results column in the leaderboard).

Best,

Phong

Thiviyan Thanapalasingam

unread,

Jul 13, 2022, 9:34:30 AM7/13/22

to LM-KBC

Hi Phong,

Thank you for your prompt response.

We checked our submission results in Codalab. It is peculiar to see that the F1 scores are much lower than the precision and recall.

For example, for the relation PersonInstrument we obtained precision of 0.6333 and recall of 0.6484, which should yield an f1 score of 0.6408. But the CodaLab gives an F1 score of 0.4555. Maybe it's worth checking the evaluation script on CodaLab.

Thank you!

Best,

Thiviyan

Phong Nguyen

unread,

Jul 13, 2022, 9:47:54 AM7/13/22

to LM-KBC

Hi Thiviyan,

Indeed the Detailed Results looked confusing. Precision, Recall and F1-score there refer to Macro Average scores, i.e., for each relation X, we compute P, R, F1 for each (subject; X) pair, then average them to get scores for the relation X.

I changed the column labels in the Detailed Results to Macro Average Precision, Recall , F1... (you may need to resubmit to see the update).